GitHub Project address :

Original address :

Standing on the roadside thinking

When shooting a train with an ordinary camera , You can't get the whole picture close , Or the whole picture with perspective , I can't see clearly in the distance . You can get a complete picture from a distance , But the subject is too small to see clearly . By video recording, you can get close and , Take a complete picture of the train . But the line of sight can only be limited to this small window , Only a small part of the image can be seen , Still can't freely observe the whole train .

however , Now that the video has recorded the complete train , Then you can directly use the puzzle method , Connect the newly recorded part of each frame back . The train appears from the beginning of the video , Just extend the new content back , Until the train leaves . The basic idea is shown in the figure below .

At this time, it is obvious that , In doing so , Because the picture has perspective , It will make the picture uneven , And the stitching is not too tight . If you narrow this window , Take only a small part of the middle of the picture , As long as this small part is small enough, its perspective deformation can be ignored , The picture will be much more uniform . By the way, there is another advantage , It can avoid the interference of surrounding sundries , In theory, as long as there is a small strip in the video line of sight that can completely take a picture of the train .

In this case , Every frame of the video can be , Cut out an image with a fixed width in the middle and put it together . This window can be infinitely narrow ( Just stay 1 Pixel width ) Do you ? This window width should obviously be related to the distance between two frames of the same point . The window is narrower than this distance , You lose information , Wider than this distance , There will be repeated content . If it's equal to this distance , Is just right . For example, the following figure is a schematic diagram of the superimposed display of the front and rear frames , The distance from the same point should be used as the width of the window to splice the image

At the same time, we have to make sure that the window is as narrow as possible , To ensure that perspective deformation and uneven illumination are reduced . So , When recording, try to make the object move slowly in the video , That is to make the displacement of the same point between two frames as small as possible . So either take a slow moving object , Or use a high-speed camera . Of course , To make sure that this moving thing doesn't paste in the picture , The shutter speed of each frame should also be as fast as possible .

Let's just calculate the shutter speed and the relationship between the window width and the actual motion speed . Using some known information, the area size of a pixel in the image corresponding to the actual focal plane can be calculated .

according to TB/T 1010-2016 《 Types and basic dimensions of wheel sets and bearings for railway vehicles 》, The rolling circle diameter of most truck wheels is 840mm, According to the figure, this model is P62k, according to The data of , The axle load of this car is 21t, According to GB/T 25024-2019 《 Rolling stock bogie Truck bogie 》 Of the 5.8.a: gross rail load on axle 25t The rolling circle diameter of the wheels and below shall be 840mm. It can be basically determined that the diameter of the wheel in the figure is 840mm.

such , Make a simple scale conversion relationship , In the picture 1 Pixels = Actual location 6.83mm. If in 60fps At the video frame rate , If you want the window width to be 1 Pixels , Then the train speed should be 6.83x60 mm/s, namely 0.41m/s. The higher the video frame rate , The faster the acceptable actual train speed , Therefore, in actual video recording, in order to ensure the window interval as small as possible , The maximum frame rate that the equipment can achieve shall be selected as far as possible .

Suppose you use 1/8000 The shutter speed of , The maximum speed of the train without ambiguity should be 6.83/(1/8000) mm/s, namely 54.64m/s.( It means that within the motion distance recognized by a pixel , Travel no more than this distance in the shutter speed time , Otherwise, there will be residual motion shadow . You can refer to “ Diffuse circle ” The concept of ) Although in order to ensure that the shutter speed can hold the speed of moving objects , But obviously 1/8000 There's no need to . And the shutter speed is too fast, which will lead to the need for higher sensitivity (ISO), Resulting in reduced picture quality . And ordinary trains ( This kind of train pulled by the locomotive ) The highest speed of the design is 160km/h, So the estimated shutter speed is as long as (6.83/1000)/(160/3.6)=1/6507. According to my practical observation , My location is close to the railway station , So the train didn't pass in front of me at full speed , Therefore, the shutter speed can be slower .

Theoretical Estimate the minimum shutter speed according to the wheel The complete conversion formula should be shutter speed =1 / \frac{\text { Wheel diameter }(\text { Pixels }) \times \text { Train speed }}{0.84(\times 3.6)} fast door speed degree =1/0.84(×3.6)  Wheel diameter  (  Pixels  )×  Train speed  , When the train speed is km/h When an (x3.6), yes m/s Time does not multiply .

Write an automated program

Read video

utilize python Do some processing on the video , Mainly depends on cv2 Some of the tools provided . First you have to open a video

video_path = './DSCF1150.MOV'
vc = cv2.VideoCapture(video_path)


At this time ,vc It represents the loaded video file , utilize vc.set(cv2.CAP_PROP_POS_FRAMES, 666666) You can specify an imaginary progress bar to put on the 666666 On the frame , The next frame operation will start from the imaginary progress bar . If you want to start from scratch , Designate him 0 Or just load the video directly .

A series of attribute parameters of this video will be obtained by using the following operations .fps- Frames per second ,total_frames- Total frames of video ,frame_width,frame_height- The width and height of the video

fps = vc.get(cv2.CAP_PROP_FPS)
total_frames = int(vc.get(cv2.CAP_PROP_FRAME_COUNT))
frame_width = int(vc.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(vc.get(cv2.CAP_PROP_FRAME_HEIGHT))


When the program calls each time rval, frame =, Will read one frame back ,rval Represents whether the frame was successfully read ,frame Represents the read frame image . therefore , Read the whole video and write it as follows

for i in range(total_frames):
    rval, frame =
    if not rval:


The form of the frame read in is np.ndarray Data in form , Therefore, it is convenient for subsequent operation . It is worth noting that , remember cv2 The default color channel is BGR The order . Or remember to turn around , Or keep using cv2 The provided image processing method .

Mosaic picture

If all frames of the video , Take a fixed width image , Splice together , The width of this image should be the total number of frames x Window width . So we just initialize an empty ndarray,img = np.empty((frame_height, total_frames * width, 3), dtype='uint8')

It's easy to cut out the desired image segment from the video frame , direct frame[:, position:position+width, :] that will do , This position It's a position closer to the middle of the image , It is also the starting edge on the left side of the window .

There was a small hole when I put the image segment into the empty array . The splicing order of the train from left to right or from right to left should also be the reverse .

  1. If you drive from left to right , The splicing order on an empty array should start from the rightmost end , That is, the starting position should be pixel_start = total_frames * width - (i + 1) * width
  2. conversely , It should be spelled from the far left , The starting position is pixel_start = i * width
  3. i Represents the frame number

good , The key components are all here , Now write out the core processing process completely , as follows :

vc = cv2.VideoCapture(video_path)
vc.set(cv2.CAP_PROP_POS_FRAMES, 0)
total_frames = int(vc.get(cv2.CAP_PROP_FRAME_COUNT))
frame_height = int(vc.get(cv2.CAP_PROP_FRAME_HEIGHT))

img = np.empty((frame_height, int(total_frames * width), 3), dtype='uint8')
for i in range(total_frames):
    rval, frame =
    if not rval:
    if v_left_right.get():
        pixel_start = int(total_frames * width) - int((i + 1) * width)
        pixel_start = int(i * width)
    pixel_end = pixel_start + math.ceil(width)

    img[:, pixel_start:pixel_end, :] = frame[:, position:position + math.ceil(width), :]


in addition , If you want to take a column every few pixels , That is, the window width is greater than 0 Less than 1 Decimals of . In this case, it needs to be related to width Make rounding judgment at all places where the calculation is made . This is repeated in some places above int() as well as ceil() Why .

Save the picture , Direct use cv2.imwrite Isn't it too bad ? In fact, a key factor is , If the picture is too long , stay Ubuntu There will be something on the system that cannot be displayed bug. So when saving the map , Cut him open , Avoid single graphs that are too large . When cutting a drawing, we first give a fixed length , When the picture is enough to cut out a segment of this length , Just cut and save , If it's not enough, just save the rest . But what I write here is not elegant enough , I hope which student can provide a more beautiful way of writing .

flag = 0
i = 0
for i in range(int((total_frames * width) / split_width)):
    split_start = i * split_width
    split_end = split_start + split_width
    cv2.imwrite('{}/{}_{}.jpg'.format(save_dir, os.path.split(video_path.get())[-1].split('.')[-2], i),
                img[:, split_start:split_end, :])
    flag = 1
if not flag or (total_frames * width) % split_width:
    i += flag
    split_start = i * split_width
    cv2.imwrite('{}/{}_{}.jpg'.format(save_dir, os.path.split(video_path.get())[-1].split('.')[-2], i),
                img[:, split_start:, :])


Use experience Optimization

Some parameters of the previous code may vary with the video , It's too much trouble to open the code before each run , I don't think it's troublesome to write it at the beginning of the program . in addition , It takes some time for the program to run , It's unfriendly to have no progress bar . and , If the difference between two adjacent frames can be displayed immediately after loading the video , You can easily determine the window width .

There are indications that , Now you need an interface to receive parameters and do some display .

Fortunately python Bring with you tkinter A simple interface making tool . I won't talk about stability and beauty , Just rub an interface directly . And don't think so much “ A user walked into the bar and ordered a fried rice ” Questions like this , Anyway, Lao pan will never deliberately input any nuclear explosion into the input box if he uses it himself ( Why? ).

  • I want to show the difference between two frames , If you want to bring some simple interactions , Then use matplotlib.
  • Maybe you need a location bar for the video , Used to change the position of two adjacent frames in the video ( For example, there may be nothing at the beginning of the video , You should take frames from the middle of the video ).
  • I also want to choose the path to open and save the video , These two functions each GUI The development framework must have .
  • After opening the video, display fps、 Total frames and other information , This value is passed directly to Label Such a control is OK .
  • You need an input box to receive some values , Include window location 、 Window width 、 The image width of the cut image .
  • A button to give these values to the processed function and execute .
  • A progress bar , It happens that this process needs to traverse video frames from beginning to end , This progress can be used as the progress of the progress bar .

First , Every GUI There should be a main form ,​

window = tk.Tk()
window.title('scanning video')
font = tf.Font(size=12)


How to say it again , The basic layout still needs to be scientific , Otherwise, it will not be easy to use by yourself . The image is displayed on the left , Other controls are on the right ,

frame_left = Frame(window)
frame_left.pack(side=LEFT, fill=BOTH, expand=YES)

frame_right = Frame(window)
frame_right.pack(side=LEFT, padx=10, expand=YES)


pack() It means to be arranged in a specified frame in order . Various parameters can be used to control different arrangement styles .

Put something on the left first . A canvas displays the image , A control bar is used as a video playback bar , One matplotlib Own toolbar .

fig = Figure(figsize=(8, 4), dpi=72)
canvas = FigureCanvasTkAgg(fig, master=frame_left)
canvas.get_tk_widget().pack(side=TOP, fill=BOTH, expand=YES)
toolbar = NavigationToolbar2Tk(canvas, frame_left)
canvas.get_tk_widget().pack(side=TOP, fill=BOTH, expand=YES)

scrollbar_display = Scale(frame_left, orient=HORIZONTAL, from_=0, to=500,
                            resolution=1, command=display_frames)


About this Scale, Every time he gets dragged , The value will change , Will execute command Corresponding function , It will also pass in the current value , If the function is complex ( For example, my ) Will compare cards , So reasonable optimization is necessary . in addition , This Scale The length of to The parameters seem to have to be deterministic , Cannot be a variable . I've tried several methods and I'll make mistakes , Only a given number is right .

def display_frames(idx):
    global vc, canvas, fig
    idx = int(idx)
    vc.set(cv2.CAP_PROP_POS_FRAMES, idx)
    rval, frame_1 =
    rval, frame_2 =
    frame_overlay = ((frame_1.astype( + frame_2.astype( * 0.5).astype(np.uint8)
    frame_overlay = cv2.cvtColor(frame_overlay, cv2.COLOR_BGR2RGB)
    ax = fig.add_subplot(111)


And put something on the right , Some logical operations will be involved on the right . The first is to open the video . I don't want to explain so much here , Look at the code at a glance .

def open_video():
    global fig, vc, total_frames, frame_height

    video_path.set(filedialog.askopenfilename(title='choose a video'))
    vc = cv2.VideoCapture(video_path.get())
    fps = vc.get(cv2.CAP_PROP_FPS)
    total_frames = int(vc.get(cv2.CAP_PROP_FRAME_COUNT))
    frame_width = int(vc.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(vc.get(cv2.CAP_PROP_FRAME_HEIGHT))
    label_video_attr['text'] = '... ......... ...\n' \
                               '... fps: {} ...\n' \
                               '... total_frames: {} ...\n' \
                               '... resolution: {}x{} ...\n' \
                               '... ......... ...'.format(fps, total_frames, frame_width, frame_height)

video_path = StringVar()
video_path.set('... select a video ...')
label_video_path = tk.Label(frame_right, textvariable=video_path, font=font)
bt_open_video = tk.Button(frame_right, text='open a video', command=open_video, font=font)


Then select the save path . What I designed here is , When saving files, a folder with the same name as the video will be created under this path , Then put the split graph in . This can be seen in the handler .

def save_img():

save_path = StringVar()
save_path.set('... select save dir ...')
label_save_path = tk.Label(frame_right, textvariable=save_path, font=font)
bt_save_img = tk.Button(frame_right, text='save dir', command=save_img, font=font)


Then display some information about the video , This will be the same as the front open_video() linkage , As long as the video is opened, the information of the video will be read , And show it here .

label_video_attr = tk.Label(frame_right, font=font, text='... ......... ...\n'
                                                            '... info area ...\n'
                                                            '... ......... ...')


Next, put three things that accept the input value . Call the values in these input boxes in the program , Use things like int(text_position.get()) To get the number inside .

label_position = tk.Label(frame_right, text='position:', font=font)
text_position = tk.Entry(frame_right, font=font)

label_width = tk.Label(frame_right, text='width:', font=font)
text_width = tk.Entry(frame_right, font=font)

label_split_width = tk.Label(frame_right, text='split_width:', font=font)
text_split_width = tk.Entry(frame_right, font=font)


Progress bar , And a to show if it's done label.

def process():
    progressbar['maximum'] = total_frames
    for i in range(total_frames):
        progressbar['value'] = i + 1

progressbar = ttk.Progressbar(frame_right, length=300, cursor='watch')
label_status = tk.Label(frame_right, text='Status: waiting...', font=font)


A radio button for selecting the direction of motion . You can assign... To each option value, And then use v_left_right.get() Gets the of the selected option value.

def process():
    for i in range(total_frames):
        if v_left_right.get():
            pixel_start = int(total_frames * width) - int((i + 1) * width)
            pixel_start = int(i * width)

v_left_right = IntVar()
radio_left_to_right = tk.Radiobutton(frame_right, text='left to right',
                                        variable=v_left_right, value=1, font=font)
radio_right_to_left = tk.Radiobutton(frame_right, text='right to left',
                                        variable=v_left_right, value=0, font=font)


The most important button .

bt_process = tk.Button(frame_right, text='process', command=process, font=font)

Last , A routine thing .


If everything goes well , The window is as shown in the figure . Use the toolbar to enlarge parts of the picture , You can also view the coordinates of a point .


In this way, you can harvest some photos with tricky angles , such as ( Cover recycling )

( It can be used as a new dividing line )​

This can also greatly compress the video . A video is close to 1G, The corresponding image is only 100M about , Reduce... Without losing image quality 90% The space occupied by .( however , The sound is gone )( however , You can't record sound at high frame rate )