Guidance for Evaluating Humanitarian Assistance in Complex Emergencies

Organisation for Economic Co-operation and Development, Development Assistance Committee 1999




The standard OECD/DAC evaluation criteria of efficiency, effectiveness, impact, sustainability and relevance are broadly appropriate for humanitarian assistance programmes.

Efficiency measures the outputs - qualitative and quantitative - in relation to the inputs. This generally requires comparing alternative approaches to achieving the same outputs, to see whether the most efficient process has been used. Cost-effectiveness is a broader concept than efficiency in that it looks beyond how inputs were converted into outputs, to whether different outputs could have been produced that would have had a greater impact in achieving the project purpose.

Effectiveness measures the extent to which the activity achieves its purpose, or whether this can be expected to happen on the basis of the outputs. Implicit within the criteria of effectiveness is timeliness (for if the delivery of food assistance is significantly delayed the nutritional status of the target population will decline). There is value in using it more explicitly as one of the standard criteria because of its importance in the assessment of emergency programmes. Similarly, issues of resourcing and preparedness should be addressed

Impact looks at the wider effects of the project - social, economic, technical, environmental - on individuals, gender and age-groups, communities, and institutions. Impacts can be immediate and longrange, intended and unintended, positive and negative, macro (sector) and micro (household). Impact studies address the question: what real difference has the activity made to the beneficiaries? How many have been affected?

Relevance is concerned with assessing whether the project is in line with local needs and priorities (as well as with donor policy). A recent evaluation of humanitarian assistance replaced the criteria of relevance with the criteria of appropriateness - the need "to tailor humanitarian activities to local needs, increasing ownership, accountability, and cost-effectiveness accordingly" (Minear, 1994). However, the two criteria complement rather than substitute each other. Relevance refers to the overall goal and purpose of a programme, whereas appropriateness is more focused on the activities and inputs. The expansion of the criteria draws attention to the fact that even where the overall programme goal is relevant - for example, to improve nutritional status - there are still questions to be asked about the programme purpose. Distributing large quantities of food aid may not be the best way of improving nutritional status. Alternatives could include food for work, cash for work, or measures to improve the functioning of local markets. Furthermore, even if distribution of food aid is deemed appropriate, it is still necessary to examine the appropriateness of the food that is distributed.

Sustainability - of particular importance for development aid - is concerned with measuring whether an activity or an impact is likely to continue after donor funding has been withdrawn. Projects need to be environmentally as well as financially sustainable. However, many humanitarian interventions, in contrast to development projects, are not designed to be sustainable. They still need assessing, however, in regard to whether, in responding to acute and immediate needs, they take the longer-term into account. Larry Minear has referred to this as Connectedness, the need "to assure that activities of a short-term emergency nature are carried out in a context which takes longer-term and interconnected problems into account" (Minear, 1994). For example, otherwise efficient food distribution programmes can damage roads used by local traders, while the presence of large refugee camps can result in severe environmental impacts in neighbouring areas. Local institutions can also suffer - the high salaries paid by international NGOs can attract skilled staff away from government clinics and schools, leaving the local population with reduced levels of service. Large-scale relief programmes can also have a significant impact on local power structures, for better or for worse.

Coverage - the need "to reach major population groups facing life-threatening suffering wherever they are, providing them with assistance and protection proportionate to their need and devoid of extraneous political agendas". Minear alerts evaluators that complex emergencies and associated humanitarian programmes can have significantly differing impacts on different population sub-groups, whether these are defined in terms of ethnicity, gender, socio-economic status, occupation, location (urban/rural or inside/outside of a country affected by conflict) or family circumstance (e.g. single mother, orphan).

Programmes need to be assessed both in terms of which groups are included in a programme, and the differential impact on those included. For example, studies have shown that, in Ethiopia in the 1980s, more than 90% of international relief went to government-controlled areas, penalising those in areas of Tigray and Eritrea controlled by insurgent movements (Minear, 1994). Other studies have revealed that single mothers may be disadvantaged when it comes to access to resources, as they are unable to leave children to queue for relief goods. In the case of the Great Lakes emergency, it was found that the coverage of the response varied enormously: refugees and IDPs, and residents in neighbouring IDP camps, were often treated in quite different ways, despite having very similar needs (Borton et al. 1996).

Coherence - refers to policy coherence, and the need to assess security, developmental, trade and military policies as well as humanitarian policies, to ensure that there is consistency and, in particular, that all policies take into account humanitarian and human rights considerations. A notable lack of coherence was evident in the international communitys response to the Great Lakes emergency in 1994. During the crisis military contingents were withdrawn from Rwanda during the genocide, when there is evidence to suggest that a rapid deployment of troops could have prevented many of the killings and the subsequent refugee influx into Zaire. This was then followed by a huge relief operation. In other instances, donor-imposed trade conditions have been blamed for precipitating economic crisis and conflict, undermining longer-term development policies. Coherence can also be analysed solely within the humanitarian sphere - to see whether all the actors are working towards the same basic goals. For example, there have been instances of one major UN agency promoting the return of refugees to their host country while another is diametrically opposed to such policies.

Finally, there is the important issue of co-ordination. This could be considered under the criteria of effectiveness, for a poorly co-ordinated response is unlikely to maximise effectiveness or impact. However, given the multiplicity of actors involved in an emergency response, it is important that coordination is explicitly considered - the intervention of a single agency cannot be evaluated in isolation from what others are doing, particularly as what may seem appropriate from the point of view of a single actor, may not be appropriate from the point of view of the system as a whole.

Given the context of conflict and insecurity, protection issues are also critical to the effectiveness of humanitarian action. Where levels of protection are poor it is feasible that the target population of an otherwise effective project distributing relief assistance are being killed by armed elements operating within the project area or even within the displaced persons/refugee camp. Assessment of the levels of security and protection in the area of the project or programme and, where relevant, the steps taken to improve them should be part of all humanitarian assistance evaluations. In those humanitarian assistance evaluations undertaken to date, such issues have often been left out of the study or not adequately covered. Often the lack of familiarity of Evaluation Managers with the issues of security and protection has contributed to such omissions.

International agreements on standards and performance, such as the Red Cross/ NGO Code of Conduct and the Sphere Project, as well as to relevant aspects on international humanitarian law provide international norms against which the performance of agencies and the system may be assessed.



Preparing the narrative history and baseline has to be the starting point for any study. Expansion and modification of the narrative history and baseline will probably continue throughout the study as more information is obtained. Whilst documentation will be an important source of information, interviews with the range of actors and members of the affected population will be a vital source too. Arguably, interviews form a more important source than is normally the case with evaluations of development assistance due to the problems of poor record keeping and documentation. Effective management of the results of these interviews is an important determinant of the effectiveness of the team. Generally, different members of the team should take responsibility for interviewing different individuals in different locations. However, whilst rational in terms of time and travel, such divisions of labour require the development of standard protocols to be used by all members of the team in their separate interviews. Such protocols should include relatively open-ended questions such as What key lessons did you learn from your experience? What in your view were the main strengths of the operation? and What would you change if you had to do it all again? Team members would need to be disciplined in their adherence to these protocols and to writing-up and sharing interview records among team members. The use of laptop computers, e-mail and free-form databases is potentially a very effective means of sharing such information and enabling all team members to contribute to and benefit from the process of constructing the narrative and baseline.

The strength of multidisciplinary teams lies in the differing perspectives that can be brought to bear on the issues and it is vital that the team has sufficient opportunities to get together at different stages of the evaluation process to discuss overlapping issues and conclusions.

Interviews with a sample of the affected population should be a mandatory part of any humanitarian assistance evaluation. Humanitarian assistance is essentially a top down process. Of necessity it often involves making assumptions about assistance needs and the provision of standardised packages of assistance. Even where time and the situation permits, humanitarian agencies are often poor at consulting or involving members of the affected population and beneficiaries or their assistance. Consequently, there can often be considerable discrepancy between the agencys perception of its performance and the perceptions of the affected population and beneficiaries. Experience shows that interviews with beneficiaries can be one of the richest sources of information in evaluations of humanitarian assistance. The use of Rapid Rural Appraisal and Participatory Rural Appraisal techniques can be very helpful in selecting members of the affected population to be interviewed and in the structuring of the interview. A combination of interviews with individual households, womens groups and open group discussions involving men as well have proven to be very productive in some contexts. However, in the context of recent or ongoing conflicts such a process may need to be modified. Ensuring the confidentiality of some interviews with individuals may be necessary. The deliberate seeking out of those who did not benefit from the assistance available can also be fruitful as it may reveal problems with the targeting and beneficiary selection processes used by the agencies. Ideally, anthropologists familiar with the culture and the indigenous language will undertake this work.



It has already been noted that one of the strengths of a multi-disciplinary team is the differing perspectives it can bring to bear on issues. All team members should, therefore, be involved in discussing the findings and linking these to conclusions. Tensions which may arise between the team leader and individual subject specialists on the nature of the conclusions can be managed more effectively if the whole team is brought together to discuss findings and conclusions. Ideally the team should hold workshops to discuss their preliminary findings on return from fieldwork (where they have not been working together in the field) and then subsequently to discuss comments received on the draft report and any new information provided by the agencies.

Whatever its scope or nature, an evaluation report will maximise its potential impact if it presents its findings, conclusions, recommendations and follow-up sections separately. If readers disagree with the recommendations (or find themselves unable to implement them because of political constraints), they may be able to agree with the findings or conclusions. Comprehensive discussions on the draft report within the target audience of the report is likely to increase their ownership of the report and the likelihood of its acceptance and follow-up.

In any large-scale evaluation, conflict over the nature of the recommendations is probably inevitable. In preparing recommendations, in order to minimise such conflict, there needs to be a clear link between the recommendations and the evidence in the body of the report to support them.

The form in which recommendations should be made is the subject of continuing debate among evaluation specialists. Some would argue that evaluation reports should contain specific, implementable recommendations detailing the actions agencies should take in order to improve future performance. Such recommendations might also spell out who is responsible for implementing each recommendation and who is responsible for monitoring whether this action takes place. This approach has the benefit of making the responsibility for implementation and follow-up clear and reduces the opportunity for organisational evasion and fudging. However, others would urge caution, favouring findings and conclusions over specific recommendations, so as not to burden policy-makers with recommendations that could lead to unimplementable policies. If recommendations are required, an evaluation team might provide policymakers with options rather than a single recommendation, together with an analysis of expected consequences. Different issues may require different responses. Technical issues may lend themselves to specific recommendations in the final report. In dealing with broader issues it may be useful to deliver the analysis to a workshop of decision-makers and evaluators which negotiates follow-up action.



Evaluation reports also need to be sold. Bureaucrats, field officers and agency staff need to be enthused, excited and convinced that the evaluation report is important and should be read. While selling the report is more the responsibility of the management group than the evaluation team, marketing strategies could be included in negotiated follow-up actions in order to help steering committee members sell the evaluation report within their own organisation.

Large, system-wide evaluations raise issues relating to a diverse range of organisations and compliance cannot be compelled. Monitoring of follow-up action is therefore important because it provides for a level of accountability which is otherwise missing. A wellresourced and well-structured monitoring process can strongly influence agencies to account for their response (or lack of it) to the evaluation report.

It is highly desirable for Evaluation Managers to establish a mechanism whereby decisions relating to the conclusions and recommendations of an evaluation are formally recorded and an explanation provided for where action is to be taken and who is to be responsible. Where no action is deemed appropriate, this would need to be justified. Evaluation departments would then have a basis for monitoring whether the agreed actions are undertaken.