[ivtv-users] hvr-1600 occasionally goes red-screen

Andy Walls awalls at radix.net
Sun Jan 17 19:56:07 CET 2010


On Tue, 2010-01-12 at 11:36 -0500, Dale E. Pontius wrote:
> Reply interspersed
> 
> Dale
> Andy Walls wrote:
> > On Sun, 2010-01-10 at 21:41 -0500, Dale Pontius wrote:
> >   
> >> Every now and then my hvr-1600 goes red-screen on me.  (Apparently this
> >> indicates no signal, so presumably something has gone south in the
> >> front-end selection logic.) From the lookup I've done, it's a symptom of
> >> no signal. 
> >>     
> >
> > I wouldn't say that "red screen" means "no signal".  I would say that
> > "red screen", with buffers still being DMA'ed properly from the CX23418,
> > means the analog front end/digitizer in the CX23418 stopped operating
> > normally.
> >
> >   
> Interesting, I have to think about that one.  If I grep for "cx18" in my
> logs nothing pops up other than a normal initialization.  Are there any
> flags I should use when loading the module that would deliver helpful info?

No not really.

You will find 

	v4l2-ctl -d /dev/video0 --log-status

interesting.  It logs the state of the CX23418's integrated A/V
decoder/digitizer on the lines that start 'cx18-0 843'.

   [ 2774.754404] cx18-0 843: Video signal:              not present
   [ 2774.754412] cx18-0 843: Detected format:           NTSC-M
   [ 2774.754418] cx18-0 843: Specified standard:        NTSC-M
   [ 2774.754424] cx18-0 843: Specified video input:     Composite 7
   [ 2774.754430] cx18-0 843: Specified audioclock freq: 48000 Hz
   [ 2774.754452] cx18-0 843: Detected audio mode:       mono
   [ 2774.754460] cx18-0 843: Detected audio standard:   no detected audio standard
   [ 2774.754466] cx18-0 843: Audio muted:               yes
   [ 2774.754473] cx18-0 843: Audio microcontroller:     running
   [ 2774.754479] cx18-0 843: Configured audio standard: automatic detection
   [ 2774.754486] cx18-0 843: Configured audio system:   BTSC
   [ 2774.754492] cx18-0 843: Specified audio input:     Tuner (In8)
   [ 2774.754498] cx18-0 843: Preferred audio mode:      stereo
	[...]
   [ 2774.760106] cx18-0: Video Input: Tuner 1
   [ 2774.760109] cx18-0: Audio Input: Tuner 1

Note the "Video signal: not present" which indicates I don't have any
valid video hooked up to the current input (the analog tuner), but that
the microcontroller is running (it only runs for the analog tuner
input).  The micrcontroller will stay muted unless it detects a valid
sound system in the Sound IF from the tuner.

Your big hint that the digitizer is hosed is this status presenting
something you wouldn't expect or nonsensical.

> >>  For a while I blamed it on my cheapo splitter/amp, because
> >> when it happened I'd take things down, move the input to another
> >> channel, and things would work, again.  Recently I installed a much
> >> better distribution amp.
> >>
> >> Today it happened again.  I guess I need to sort out symptoms better,
> >> but from what I can tell, once it goes red-screen, a simple reboot
> >> doesn't clear things up, though I need to verify this.  I'm going to
> >> presume that rmmod/modprobe wouldn't clear things either, if a reboot
> >> didn't.  These facts are a bit murky, from the old splitter/amp, but I
> >> do remember that once an input went, it stayed gone until I fiddled with
> >> the hardware.
> >>
> >> Tonight when that happened, I realized that fiddling with the hardware
> >> also meant really disconnecting the power, and that means the V5SB (5V
> >> standby - on when the front-panel power is off) too.  So I shutdown,
> >> then turned off the switch on the pack of the power supply to really
> >> power off, and waited about 20 minutes.  (I wanted to record "Serenity"
> >> and I thought it was at 6:00, but it was really at 6:30, so I could have
> >> given it 50 minutes.)  Anyway, all is well now.
> >>     
> >
> >
> > So it sounds like either:
> >
> > a. a resistor somewhere - power supply, motherboard, or HVR-1600 - has
> > open-circuited.  Once you build up enough charge on some capacitors
> > somewhere, the circuit stops working (oscillators for clock signals stop
> > oscillating)
> >
> > b. You're power supply voltages are sagging and you're getting CMOS
> > latch up.  See section 4.3 of the publicly available CX25840/1/2/3
> > datasheet for a terse description of CMOS latch up.
> >   
> Is there any way shy of probing like a maniac to find (a) above?

http://dl.ivtvdriver.org/datasheets/video/cx25840.pdf



> Incidentally, this situation has happened at some point or other on both
> HVR-1600, though I'm not sure it has ever happened to both at the same
> time.

Well all theories aside, there isn't really much to be done from
software.  The biggest two workarounds I can recommend are:

1. Increasing 

	#define CX18_MAX_MMIO_WR_RETRIES 10

in linux/drivers/media/video/cx18/cx18-driver.h from 10 to some higher
value.  This would be to avoid any CX23418 register write failures you
could be experiencing, to make sure the digitizer is always set up
properly.  Although retrying 10 times is usually enough for any system.

2. Black list the cx18 driver so it doesn't load at boot when the PCI
bus is busy.  Have a script load it some time shortly thereafter when
the PCI bus isn't very busy.  Same objective here: reduce the number of
any PCI bus write failures.


>   As for (b), I have lm_sensors running, but never really trusted
> the values it gives much more than a buglight.  In other words, the 5V
> supply is within a few tenths of 5V, and I certainly don't believe the
> hundredths of a volt digit.  From my perspective, it's an instrument
> that has never been calibrated.  Am I being unfair?

No.  I don't know the sensor used, so I can't speak to the precision.  I
would maybe trust the 0.1 volt for precision.  I'm sure the accuracy is
probably crap.  That doesn't matter so much as long as the sensor reads
somewhat consistently over time: if you know what's "normal" then you
don't need accuracy, just precision.

>   Are the
> measurements form lm_sensors better than I'm giving them credit for
> being?

It depends on the chip and the transducer really.  But since we're
talking about consumer PC's, skepticism about quality is rarely
ill-founded.

>   That said, I do believe that I could likely take repeated
> measurements and look for drift, etc.




> 
> As for latch-up, sounds like they don't have the best process/ESD
> design.  I've got 27 years in silicon design, and 4 years of test before
> that.  Forward-biasing a pin is certainly a bad thing to do, but decent
> layout rules and a really good ESD device shouldn't let latch-up happen.

OK.  27 years makes *you* the expert! ;)

I was just hypothesizing that a large voltage deviation might cause the
condition.  I suppose with proper voltage regulation and transient
supression on a board, it shouldn't be a problem.


> > Some rhetorical questions:
> >
> > Was 20 minutes needed to get back to normal operation, or is some
> > smaller period of time also OK?
> >   
> I don't really know.  Since I'd been assuming that it was the fault of
> my old distribution amp, I hadn't considered the problem to be inside
> the computer case until now.  My assumption was "something needs to
> discharge," and the more time I could give it, the better.  My wife got
> off of the computer a little after 5:30, and I thought "Serenity" was on
> SciFi (can't SyFy) at 6:00, so 20 minutes was what I thought I had.  I
> hadn't gotten into a debug frame of mind, yet.  Had I realized that
> "Serenity" was really on at 6:30, I would have given it 50 minutes.

OK.  I was just looking at if you could shorten your corrective action
timeline.

If anything, a quick check of proper operation after every reboot may be
a good data point to collect.


> > How many minutes of continuous operation does it take to get the red
> > screen?  Is there a lot of variance in that number?  
> >   
> Again, I don't really know.  This is very infrequent, and since I was
> blaming a box I planned on replacing, I didn't bother to look for
> patterns.  Now I am.
> 
> As for the best I can tell about this incident, it may well have been on
> the fritz from boot, that morning.  When I found that the show I wanted
> to watch had the red screen, I checked something recorded earlier that
> day, and it was bad, too.  Obviously liveTV was bad.
> 
> Incidentally, a day or two after Chrismas the northbridge fan failed. 
> It had failed before, but I was able to clean and oil it, and get it
> running again.  I also ordered a passive heatsink, but had not gotten
> around to installing it.  So with system and new heatsink I went to a
> friend's house Tuesday after Christmas, and we installed it on his
> static-safe bench.  The system has been powered up since, until the
> other day when I flipped the switch off for 20 min.
> 
> As mentioned next, next time this happens I'll have my debug hat on, and
> get some much better data.  I'm still thinking that the 5VSB supply is
> holding some badness in there.

That was my guess.  PCI bus errors when reconfiguring between tuner and
baseband inputs is always another.

Also, the CX23418 is a computing system running firmware.  The firmware
could very well have a bug that gets triggered very rarely.


> >
> > A few more rhetorical questions:
> >
> > How old is your power supply?  Is it a name brand or off brand?
> >   
> I can't recall the brand - it's the second in this case.  The original
> PS was a 350W Antec that came with the case.  Some time after installing
> a second hard drive, the system started occasionally powering down.  I
> set up remote logging so that I could capture the last events before the
> crash that didn't make it onto disk, and found nothing unusual
> happening.  So I decided it must be the power supply.
> 
> I read reviews, found a brand working its way into the power supply
> market that was well reviewed, and got a 430W model.  I've been happy
> with it, though of course that doesn't mean that it's perfect, but it
> hasn't knowingly caused any problems.

Sounds like it should be fine.  Maybe temporary overvoltage would be the
most likely power supply problem.


> A sensors readout follows, if it indicates anything.  (Keeping in mind
> my comments above.)  There are lots of alarms, but none of them look real.
> -------------------------------------
> user at localhost ~ $ sensors
> k8temp-pci-00c3
> Adapter: PCI adapter
> Core0 Temp:
>              +24 C
> 
> it8712-isa-0290
> Adapter: ISA adapter
> VCore 1:   +1.10 V  (min =  +0.00 V, max =  +4.08 V)   ALARM
> VCore 2:   +0.00 V  (min =  +0.00 V, max =  +4.08 V)   ALARM
> +3.3V:     +3.31 V  (min =  +0.00 V, max =  +4.08 V)   ALARM
> +5V:       +4.95 V  (min =  +0.00 V, max =  +6.85 V)   ALARM
> +12V:     +11.84 V  (min =  +0.00 V, max = +16.32 V)   ALARM
> -12V:      -4.78 V  (min = -27.36 V, max =  +3.93 V)   ALARM
> -5V:      -13.64 V  (min = -13.64 V, max =  +4.03 V)   ALARM
> Stdby:     +4.89 V  (min =  +0.00 V, max =  +6.85 V)   ALARM
> VBat:      +3.02 V
> fan1:     3308 RPM  (min =    0 RPM, div = 8)         
> fan2:        0 RPM  (min =    0 RPM, div = 8)         
> fan3:        0 RPM  (min =    0 RPM, div = 8)         
> M/B Temp:    +26 C  (low  =    -1 C, high =  +127 C)   sensor =
> invalid   ALARM
> CPU Temp:    +30 C  (low  =    -1 C, high =  +127 C)   sensor =
> invalid   ALARM
> Temp3:       +24 C  (low  =    -1 C, high =  +127 C)   sensor =
> invalid   ALARM

If 

a. you have an automated way to collect power supply stats over the long
term and build plots (using tools like Perl and gnuplot) 

and

b. you can detect the red-screen condition with software, like some
unique condition in the -log-status output, and log the status maybe
once an hour.

you could look for a correlation.

That's a lot of work just to verify the power supply has transients or
over-voltage condiitons that cause the problem.  That's something I
would assign to one of my kids and tell them to make it double for a
school math or science project.

Regards,
Andy





More information about the ivtv-users mailing list