Network Management Planning - SolarWinds Certified Professional

Uploaded by solarwindsinc on 16.06.2009

Section 2 of the SolarWinds Certified Professional or SCP program is on
Network Management Planning.
Network Management Planning includes things like translating business requirements
into monitoring needs, thresholds, and Orion NPM configurations,
building and leveraging reports that meet the needs of various stakeholders,
determining your monitoring scope and the impact on the network, and of course determining
the impact of network topology
on the monitoring system.
Translating business requirements means doing things like surveying key
stakeholders and departments for their monitoring and reporting needs.
It also means documenting goals for your NMS.
I really can't stress enough how important it is to document the goals for
your network management system.
As a matter of fact, I recently worked with a very large insurance organization,
and their goals were to increase availability of the network
by several orders of magnitude.
However, they didn't fully document their goals and they didn't baseline the
network before they started,
so they really didn't have a good idea six months and a year later how well they had
done in their project. So be sure to document both your goals
and take a current baseline when you start deploying the system so you know what you are dealing
and you can adequately measure either improvement or, hopefully not, the
Now also key metrics will include things like data granularity,
and data granularity means how often you collect the data. If you're collecting
data for instance on bandwidth utilization
and you're only polling every five minutes,
then you might miss out on certain spikes
during a five minute interval, and I'll talk a little bit more about how that
works in just a moment.
The other thing to keep in mind when it comes to
translating business requirements into monitoring needs
is data retention.
Data retention means how long you'll keep the data
that your NMS collect before the data is either deleted
or summarized, which means sort of rolled up.
Now other things you'll want to do when you're documenting these monitoring
is to define both the needs of the teams and the thresholds.
Now what the thresholds mean are the levels at which you want to start seeing
error states
or paying more attention.
That will come into play as you are building your alerts
and setting up all your different reports, and of course there are several
different settings within the Orion Network Performance Monitor
and different configuration options you'll need to know.
You can find more information about that in later educational videos in the
Orion NPM administration section, and also within the Orion NPM
administrator's guide on the website under support and documentation.
Another key area of this section is knowing how to build and leverage
reports. Now the Orion report writer makes it quite easy to go ahead and do
which you really need to understand what types of reports to build
and which reports are used for what.
Now baselines, as we mentioned before, are especially helpful when you're doing
capacity planning
or doing SLA management.
You also want to think about things like availability
which means
how much the network is up or how to measure your up time
and performance and capacity planning.
Now these things are very, very important, and when you're dealing with availability,
one of the things you want to understand is the term five nines, so maybe four nines.
Five nines availability simply means
the network is up ninety-nine point nine nine nine percent of the time,
which roughly equates to about five minutes of downtime per year,
which really isn't very much.
Another thing you want to keep in mind, and an important part of the section of
the test, will be on determining the scope
and the impact on the network that the Network Management System will actually
Now some things you want to keep in mind for instance are the polling frequency.
Now what that means is the more often you poll or the more often you collect
then the more granular and the more detailed your statistics are.
However, that also means
that you have a larger impact on the network. There's more traffic on the
network in terms of network management traffic, and
there's a higher load created on your on your devices
that your managing. Now, the advantage again
is the more often you're polling, the more detailed you date is
and the quicker you can detect issues. However, the downside is
it can create extra traffic on the network. Most modern network management systems and
most modern network devices are able to handle a significant amount of
management traffic
so it's not too big a deal. This is also one of the key reasons you want to
leverage both event-based network management protocols, which offer real-time,
unsolicited feedback, and polling-based or query-based protocols that do a query-and-answer
type of feedback.
Now less polling means less bandwidth usage
and less storage requirements, which is another key important thing to notice,
because you need to be thinking about
how much data you will want to keep in terms of your disk requirements for
your SQL server, which stores the data that Orion collects.
Last but not least, you will want to be able to determine the impact that topology has
on network monitoring.
And what that simply means is that as the network is displayed
in a map or any sort of topological representation,
you can sort of see
how an outage might affect your ability to collect data,
meaning if you have devices in a remote site you want to collect statistics on, but
the link to the remote site goes down, then you are no longer able to collect the
statistics from that site. So it might be a need to roll out a specific network
management collector out to that site to do that type of collection,
especially in a system where you have multiple data centers. In those cases,
many customers will roll out multiple copies of Orion, one each datacenter, and
then roll that up with the Enterprise Operations Console or EOC.
Now remember if you can't get to it you can't monitor it,
so it's really important
to keep track of the topology, because you can imply reachability status based
upon those topology changes. Now other things that can affect reachability are
for instance access list or ACLs and firewall rules.
Many times on networks that aren't being managed,
the security engineers have blocked things like SNMP and ICMP, which are
really your two primary network management protocols.
So as you are starting to audit the network and prepared to roll out your network management
system, you need to pay attention to where that traffic is allowed. Now a best practice
would state
that it is actually suggested that you roll out a management VLAN
and that you update your ACLs and firewall rules to allow access
via the management protocols
from anywhere on that VLAN.
Now some customers go ahead and limit that to only specific IP addresses
of their management servers,
but over the long run that can cause you a lot of extra work because as you are
to add more network management tools, applications, and servers, you continually
have to update those access lists and firewall rules. So you really want to pay attention to that and
again, if you can,
deploy a management VLAN or a management subnet instead.
Also, you want to pay attention to different security zones,
especially in a company that has an internet presence, you may have devices
that are both inside the firewall, outside,
and in sort of a DNZ area. Identifying these security zones and how your
network management system
will have to react with those is a key part of understanding
how you can deploy the network management system, and your overall reachability in
management strategy.
Last but not least, overlapping a non-addable address space can be a real
headache when it comes to network management systems.
You wanna audit the network, understand which addresses are reachable from where,
and really take that into account as you are starting to document the systems and
roll out the network management system. In many cases, if you have overlapping address
space and you have devices on each of those overlapping subnets
that you need to monitor, you will need a separate polling engine for Orion
for each duplicate address space zone.