Codejoy's questions -server

Codejoy

Asked: 2024-11-21 14:10:41 +0800 CST

Strange error using du -h -d1

5

I have an aging mailserver....it simply runs courier postfix and smtp. It is a KVM VM and has two drives, one for os and one for data. I have a very full data drive (reports 100% full even though the used and available have a spread of about 24GB). I am not sure why or what is eating up space and then releasing it. A top shows mostly just postfix's imapd doing stuff. I cannot get iotop on this machine. So I figured to start freeing up space in users mailboxes on the server I would do a du -h -d1 to try to get who the biggest offenders are. Well, this command runs SLOW slower than it has ever. So since it ran slow, I figured I would issue a screen command of:

du -h -d1 > mailboxsizes.txt

So I could come to it in the morning and see the usages. It wrote out about 6 mailboxes, largest one being 2.2GB and then nothing. So came to the actual machine to see what the command was doing if it was still running and saw this:

[root@xmail]# du -h -d1 > /root/mailboxsizes.txt
[14280.306953] INFO: task imapd:12559 blocked for more than 120 seconds.
[14280.307710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[14280.309680] imapd           D ffff8800d3d9cd98     0 12559      1 0x00000080
[14280.310591]  ffff8800b17bbc20 0000000000000086 ffff880057bbce70 ffff8800b17bbfd8
[14280.310591]  ffff8800b17bbfd8 ffff8800b17bbfd8 ffff880057bbce70 ffff8800d3d9cd90
[14280.313532]  ffff8800d3d9cd94 ffff880057bbce70 00000000ffffffff ffff8800d3d9cd98
[14280.313532] Call Trace:
[14280.315669]  [<ffffffff8168d159>] schedule_preempt_disabled+0x29/0x70
[14280.316637]  [<ffffffff8168adb5>] __mutex_lock_slowpath+0xc5/0x1c0
[14280.316637]  [<ffffffff81208e17>] ? unlazy_walk+0x87/0x140
[14280.318543]  [<ffffffff8168a21f>] mutex_lock+0x1f/0x2f
[14280.319516]  [<ffffffff81683c93>] lookup_slow+0x33/0xa7
[14280.320690]  [<ffffffff8120c8f3>] path_lookupat+0x773/0x7a0
[14280.321718]  [<ffffffff81183775>] ? filemap_fault+0x215/0x410
[14280.321718]  [<ffffffff811de5e5>] ? kmem_cache_alloc+0x35/0x1e0
[14280.323363]  [<ffffffff8120f23f>] ? getname_flags+0x4f/0x1a0
[14280.324348]  [<ffffffff8120c94b>] filename_lookup+0x2b/0xc0
[14280.324348]  [<ffffffff81210367>] user_path_at_empty+0x67/0xc0
[14280.325307]  [<ffffffff811b1431>] ? handle_mm_fault+0x6b1/0xfe0
[14280.327150]  [<ffffffff812103d1>] user_path_at+0x11/0x20
[14280.327965]  [<ffffffff81203843>] vfs_fstatat+0x63/0xc0
[14280.328093]  [<ffffffff81203dae>] SYSC_newstat+0x2e/0x60
[14280.328093]  [<ffffffff81692875>] ? do_page_fault+0x35/0x90
[14280.330895]  [<ffffffff8168ea88>] ? page_fault+0x28/0x30
[14280.331790]  [<ffffffff8120408e>] SyS_newstat+0xe/0x10
[14280.331857]  [<ffffffff81697089>] system_call_fastpath+0x16/0x1b

I am new to sysadmining and have zero idea what any of this is telling me save for something to do with imapd? I have done a reboot on this machine several times and it barely released any hard drives space or seemingly resources. I cannot figure out what is going on and why du failed above like it did. Mostly I am here asking where to even start? While this machine is old and has always had its moments, it has never done this before (even though I do acknowledge the data drive is low on space) but if I clear it, something eats it up.

For completeness:

df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        2.9G     0  2.9G   0% /dev
tmpfs           2.9G     0  2.9G   0% /dev/shm
tmpfs           2.9G   41M  2.8G   2% /run
tmpfs           2.9G     0  2.9G   0% /sys/fs/cgroup
/dev/vda3        21G   18G  2.1G  90% /
/dev/vdb        459G  435G  442M 100% /mail
/dev/vda1       976M  119M  790M  14% /boot
tmpfs           581M     0  581M   0% /run/user/0
tmpfs           581M     0  581M   0% /run/user/1000


top - 06:06:53 up  6:52,  3 users,  load average: 36.42, 36.64, 31.74
Tasks: 346 total,   8 running, 338 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.4 us,  1.2 sy,  0.0 ni,  0.0 id, 89.9 wa,  0.0 hi,  0.0 si,  1.5 st
KiB Mem :  5946284 total,   130280 free,  2278016 used,  3537988 buff/cache
KiB Swap:  2516988 total,  1906332 free,   610656 used.  3362528 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
19174 postfix   20   0   27028   5504   1488 R   8.3  0.1   0:38.53 imapd       
19586 postfix   20   0   27028   5504   1488 D   7.6  0.1   0:32.18 imapd       
19008 postfix   20   0   27028   5504   1488 R   7.0  0.1   0:48.61 imapd       
19372 postfix   20   0   27464   5872   1504 D   4.3  0.1   0:30.38 imapd       
20087 postfix   20   0   27028   5504   1488 D   4.3  0.1   0:23.27 imapd       
20188 postfix   20   0   27028   5504   1488 D   4.3  0.1   0:23.31 imapd       
20353 postfix   20   0   27028   5508   1488 D   4.3  0.1   0:23.05 imapd       
19963 postfix   20   0   27028   5508   1488 D   4.0  0.1   0:23.85 imapd       
20275 postfix   20   0   27028   5508   1488 D   4.0  0.1   0:22.56 imapd       
18460 postfix   20   0   29348   5748   1588 R   3.7  0.1   0:38.09 imapd       
20236 postfix   20   0   27028   5516   1488 D   3.7  0.1   0:22.86 imapd       
   32 root      20   0       0      0      0 S   1.7  0.0   5:57.44 kswapd0     
20079 postfix   20   0   32728   9152   1520 S   1.7  0.2   0:01.90 imapd       
19702 postfix   20   0   27028   5516   1488 D   1.3  0.1   0:27.77 imapd       
18575 postfix   20   0   30472   6848   1596 D   1.0  0.1   0:14.86 imapd       
19782 postfix   20   0   27028   5508   1488 D   1.0  0.1   0:27.02 imapd       
 1026 root      20   0 1174028  22616   8992 S   0.7  0.4   2:53.90 fail2ban-s+

Not sure what to look at and try next to figure out where I can du some folders and know whose old inbox's we are keeping around to get rid of in an attempt to free up space and hopefully make the server perform better.

my only idea, is to systemctl stop postfix for a bit and see if du's and ls's work better and double check that top isn't pinged out with that.

also in case it is relevant an iostat:

iostat
Linux 3.10.0-514.16.1.el7.x86_64 (xmail)    11/21/2024  _x86_64_    (3 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.95    0.01    1.29   88.86    0.65    3.24

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
vda              11.92       252.15        61.60    6335865    1547820
vdb            1517.62     62131.62        78.76 1561227117    1979120

Codejoy

Asked: 2023-09-14 01:50:37 +0800 CST

umount a bind source won't work, and device out of space

6

I think I may have shot myself in the foot. Long ago I tried to backup a old linux machine to a nas using the mount bind command:

mount --bind / /mnt/src tar -C /mnt/src -c . > /mnt/backup_to_nas/full-backup-date '+%d-%B-%Y'.tar.gz --exclude=tmp --exclude=mnt

Then I realized I never umounted /mnt/src

My question is, is this taking up double the space on the / that I have? I am woefully out of space and not sure if I am chasing my tail trying to delete files to recover space.

df -h shows:

[root@web-server mnt]# df -h
Filesystem                        Size  Used Avail Use% Mounted on
devtmpfs                          1.9G     0  1.9G   0% /dev
tmpfs                             1.9G  4.0K  1.9G   1% /dev/shm
tmpfs                             1.9G  194M  1.7G  11% /run
tmpfs                             1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/vda2                         7.6G  7.6G     0 100% /
/dev/vda1                         190M  171M  5.3M  98% /boot
/dev/vdb                          230G  152G   67G  70% /usr/local
tmpfs                             379M     0  379M   0% /run/user/0
tmpfs                             379M     0  379M   0% /run/user/2527
tmpfs                             379M     0  379M   0% /run/user/2543
10.50.1.104:/data                 9.1T  8.0T  610G  94% /mnt/backup
tmpfs                             379M     0  379M   0% /run/user/2539
10.75.0.199://volume1/ICCBackups   32T  4.2T   28T  14% /mnt/backup_to_nas
tmpfs                             379M     0  379M   0% /run/user/500

lsblk shows:

[root@web-server mnt]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
vda    253:0    0     8G  0 disk 
├─vda1 253:1    0   200M  0 part /boot
└─vda2 253:2    0   7.8G  0 part /mnt/src
vdb    253:16   0 232.8G  0 disk /usr/local

did a space check in root:

du -xhs * | sort -rh

134G    home
15G data
6.9G    mnt
186M    run
170M    boot
52M etc
848K    ARC-History.pdf
16K lost+found
8.0K    export
8.0K    backup
4.0K    media
4.0K    check_permissions.py
0   sys
0   proc
0   lib64
0   lib
0   dev
0   bin

I don't understand why some large folders like /home are bigger than what df -h reports so I did a mount | grep home and got:

[root@web-server mnt]# mount | grep home
/dev/vdb on /home type ext4 (rw,relatime,data=ordered)
/dev/vda2 on /home/weather/public_html/weather_rrd type ext4 (rw,relatime,data=ordered)
/dev/vda2 on /usr/local/home/weather/public_html/weather_rrd type ext4 (rw,relatime,data=ordered)
/dev/vdb on /home/workers/public_html/VM-SYSTEMS type ext4 (rw,relatime,data=ordered)
/dev/vdb on /usr/local/home/workers/public_html/VM-SYSTEMS type ext4 (rw,relatime,data=ordered)

It looks like if I could figure out what these are and how to relocate I might buy myself some breathing room:

/dev/vda2 on /home/weather/public_html/weather_rrd type ext4 (rw,relatime,data=ordered)
/dev/vda2 on /usr/local/home/weather/public_html/weather_rrd type ext4 (rw,relatime,data=ordered)

Then again all this to say, did my mount --bind command of / into /mnt/src take up double space? Is there something I didn't understand (probably) when doing this? I did a lsof /mnt/src and it seems that everything is using it.

but did start with this to report not sure if its relevant:

[root@web-server mnt]# lsof /mnt/src
lsof: WARNING: can't stat() ext4 file system /var/www/html/net-status/bw-mon (deleted)
      Output information may be incomplete.
lsof: WARNING: can't stat() ext4 file system /usr/local/www/net-status/bw-mon (deleted)
      Output information may be incomplete.

So not sure where to start deleting large files as I find them in /? (even though it seems /home is somewhere else?). ls -lh doesn't show it as a symlink.

[root@web-server /]# ls -lh
total 1.2M
-rw-------    1 root root 847K Jul 27  2020 ARC-History.pdf
drwxr-xr-x    3 root root 4.0K Sep 13 17:21 backup
lrwxrwxrwx    1 root root    7 May 11  2018 bin -> usr/bin
dr-xr-xr-x.   5 root root 3.0K Aug 16 18:06 boot
-rw-r--r--    1 root root 3.3K May 12  2019 check_permissions.py
-rw-------    1 root root    0 Sep 13 17:33 core.20448
-rw-------    1 root root    0 Sep 13 17:46 core.28055
drwxr-xr-x    7 root root 4.0K Sep 13 17:16 data
drwxr-xr-x   18 root root 3.1K Sep 12 18:08 dev
drwxr-xr-x. 148 root root  12K Aug 31 17:26 etc
drwxr-xr-x    3 root root 4.0K Apr  2  2015 export
drwxr-xr-x   22 root root 4.0K Aug 27 18:06 home
lrwxrwxrwx    1 root root    7 May 11  2018 lib -> usr/lib
lrwxrwxrwx    1 root root    9 May 11  2018 lib64 -> usr/lib64
drwx------.   2 root root  16K Mar 31  2015 lost+found
drwxr-xr-x.   2 root root 4.0K Apr 12  2018 media
drwxr-xr-x.   6 root root 4.0K Feb  9  2023 mnt
drwxr-xr-x.   7 root root 4.0K Aug 25 15:13 opt
dr-xr-xr-x  225 root root    0 Nov  8  2021 proc
dr-xr-x---.  26 root root  12K Aug 25 15:19 root
drwxr-xr-x   48 root root 1.5K Sep 13 17:01 run
lrwxrwxrwx    1 root root    8 May 11  2018 sbin -> usr/sbin
-rw-r--r--    1 root root    0 May 15  2019 searchresults.txt
drwxr-xr-x.   2 root root 4.0K Apr 12  2018 srv
dr-xr-xr-x   13 root root    0 Nov 29  2021 sys
drwxrwxrwt.  16 root root 244K Sep 13 17:49 tmp
drwxr-xr-x.  14 root root 4.0K May 11  2018 usr
drwxr-xr-x.  25 root root 4.0K Aug  9 20:21 var
drwxr-xr-x    2 root root 4.0K Aug  9  2015 zaphod-data

EDIT: Looked at the fstab file that shed some light onto where stuff is:

UUID=c9d6c99f-d7a5-4117-93ba-029cc34d8b61 /                       ext4    defaults        1 1
UUID=19fcad32-0fcb-423a-87e9-586d03d2e406 /boot                   ext4    defaults        1 2
#LABEL=/home    /home       ext4 defaults 1 2
#192.41.211.105:/export/images      /export/images          nfs     rsize=32768,wsize=32768,actimeo=0,bg,intr
LABEL=local-web-server  /usr/local  ext4    defaults    1 2
/usr/local/home     /home       none    bind        0 0
/usr/local/www      /var/www/html   none    bind        0 0
/usr/local/data     /data       none    bind        0 0
/tmp/rrdweather     /home/weather/public_html/weather_rrd   none    bind    0 0
/usr/local/data     /data       none    bind        0 0
/home/workers/Site/VM-SYSTEMS /home/workers/public_html/VM-SYSTEMS none bind 0 0
#/home/workers/public_html/WebCalendar-1.2.3 /home/workers/public_html/WebCalendar none bind 0 0
#/home/workers/public_html/WebCalendar-1.2.0 /home/workers/public_html/WebCalendar~ none bind 0 0
/home/workers/public_html/net-status /usr/local/www/net-status none bind 0 0
/tmp/bw-mon     /var/www/html/net-status/bw-mon     none    bind        0 0
/var/lib/smokeping/images /var/www/html/smokeping/images none   bind        0 0

#mounting for our cheezy backup of web-server
10.50.1.104:/data /mnt/backup nfs

Edit 2: So I realized if I did the sort command in /mnt/src I get more accurate information....

[root@web-server src]# du -xhs * | sort -rh
4.8G    usr
707M    var
421M    opt
397M    backup
168M    root
150M    tmp
52M etc
848K    ARC-History.pdf
16K lost+found
12K mnt
8.0K    export
4.0K    zaphod-data
4.0K    sys
4.0K    srv
4.0K    run
4.0K    proc
4.0K    media
4.0K    home
4.0K    dev
4.0K    data
4.0K    check_permissions.py
4.0K    boot
0   searchresults.txt
0   sbin
0   lib64
0   lib
0   core.28055
0   core.20448
0   bin

Shows me maybe space to clear in usr (no idea what I am not a linux guru), var had some nice stuff I was able to clear out (old logs). Still working on it but the crux I guess of what I am asking is should I really get that /mnt/src unmounted? or is it okay letting it ride like this since everytime I try to issue the command it says its busy.

Codejoy

Asked: 2022-08-24 09:53:51 +0800 CST

macos: env: python: No such file or directory when I use dot slash on a .py file

0

I thought all I had to do was add things to my path and pythonpath but MAC os still cannot find Python when I do a ./myfile.py etc...

my .zshrc:

export PATH="/Library/Frameworks/Python.framework/Versions/3.8/bin:/opt/homebrew/bin:$PATH"
PYTHONPATH="/Library/Frameworks/Python.framework/Versions/3.8/bin"
export PYTHONPATH

But nothing I can source .zshrc or open a new terminal to be sure and if I say:

./myfile.py

I get the error:

env: python: No such file or directory

Do I have to also alias python to python3???

Codejoy

Asked: 2022-04-14 10:10:18 +0800 CST

Stopping DRBD so I can run some tests with a VM

0

We have two servers I inherited, both running DRBD and each then running KVM virtual machines.

I would love to stop a VM running on server1, and bring up just the 1 VM on server2 for some tests. Though with DRBD doing its thing on these servers and the broken startup script (posted here) I have from server2, it makes me nervous as I don't want to stop fully server1, just the one vm on it. I didn't create or configure these machines and I am in doubt weather the DRBD (Which I know little about) was fully properly implemented. Server1's stop script is posted and servers2 start script is posted here to.

But before all that, I guess I just want to know how to stop safely drbd from mucking with the two servers for a time. So that I can mount a file system on server2, and bring up a VM that I stopped on server1.

Server1 site stop script:

echo    poweroff -p now
echo
read -rsp $'Press any key to continue...\n' -n1 key

virsh shutdown irsc
virsh shutdown backup
virsh shutdown user
virsh shutdown repository
virsh shutdown web-firewall
virsh shutdown wiki
virsh shutdown a-gateway
virsh shutdown b-gateway
virsh shutdown dhcp
 
# shutdown the drbd
#drbd-stop
echo now manually turn off drbd
echo     umount /systems
echo     drbdadm secondary all
echo     drbd-overview

Why the drbd-stop is commented out no idea, and why it echos things it should be doing? I have no idea. But okay, so thats the stop script. Server1's img files for the KVM live in /systems btw.

So I goto server 2. First issue: the /systems folder has no img files in it, but there is a mount line in the startup script. Here is the start-script for server2: (I have no idea what the nodedev-detach pci is really doing.)

#!/bin/sh
# isolate the CPUs for the VMs
#site-isolate

# backup 192 network
virsh nodedev-detach pci_0000_06_10_2
# 10.7
virsh nodedev-detach pci_0000_02_10_0
# 10.5
virsh nodedev-detach pci_0000_06_10_3
# 10.2
virsh nodedev-detach pci_0000_02_10_1

# a-gateway
# 192
virsh nodedev-detach pci_0000_06_10_0
# 10.5
virsh nodedev-detach pci_0000_06_10_1
# 10.7
virsh nodedev-detach pci_0000_02_10_4

# b-gateway
# 192
virsh nodedev-detach pci_0000_06_10_4
# 10.2
virsh nodedev-detach pci_0000_02_10_5

# dhcp
# 10.5
virsh nodedev-detach pci_0000_06_10_7
# 10.7
virsh nodedev-detach pci_0000_02_11_0
# 10.2
virsh nodedev-detach pci_0000_02_11_1

# dns2
# 192
virsh nodedev-detach pci_0000_06_11_0

# web-server
# 10.7
virsh nodedev-detach pci_0000_02_11_4

# web-firewall
# 192
virsh nodedev-detach pci_0000_06_10_6
# 10.7
virsh nodedev-detach pci_0000_02_12_4
# 10.2
virsh nodedev-detach pci_0000_02_11_5

# irsc
# 10.7
virsh nodedev-detach pci_0000_02_13_0
# BTTV
virsh nodedev-detach pci_0000_09_00_0

# firewall
# 10.25
virsh nodedev-detach pci_0000_02_12_1
# 10.5
virsh nodedev-detach pci_0000_06_11_1

# bro-server
# 192
virsh nodedev-detach pci_0000_06_11_2

echo start drbd
# start the disk mirror with the slave
service drbd start
sleep 2

# now setup drbd and filesystems

# for all VM images, mount the /systems
drbdadm primary systems
mount /dev/drbd/by-res/systems /systems

# for arc-gateway
drbdadm primary arc-gateway-data

# for backup
drbdadm primary archive
drbdadm primary amanda

# for user computer
# for user computer
drbdadm primary users

# for web server computer
drbdadm primary web-server

# for wiki
drbdadm primary svn

# for irsc. *** this is the one I want to bring up?  do I have to do this drbdadm primary irsc
drbdadm primary irsc

echo start vms
# start the VMs
# fundamental servers
virsh start dns2
virsh start dhcp
# take a long time to start servers
virsh start devel1
virsh start xmail
# gateways, sdss-gateway takes a long time
virsh start sdss-gateway
virsh start arc-gateway
virsh start user
# APO servers
virsh start web-server
virsh start backup
virsh start repository
virsh start wiki
virsh start irsc

# finally web firewall, now online to the world
virsh start web-firewall

Codejoy

Asked: 2021-10-29 11:02:19 +0800 CST

NFS mount a user cannot write gets permission denied. Gid, UID match and am not using all_squash

2

Both server and client are cent os 7.0

My data VM has a exports file:

/data   10.75.0.0/24(rw,sync,no_subtree_check) 10.50.1.0/24(rw,sync,no_subtree_check,no_root_squash)

My client has a fstab:

10.50.1.248:/data/archive/images /export/images nfs rsize=32768,wsize=32768,actimeo=0,bg,intr

Sure enough that client whose user is arc can ls /export/images but if I try to cd into there and touch a file:

[arc@megamcachine images]$ touch somefile
touch: cannot touch ‘somefile’: Permission denied

Now the id of that user on the data vm is:

id arc
uid=1001(arc) gid=1001(arc) groups=1001(arc),10000(canwrite),10001(tron)

The id arc on the client is:

 id arc
uid=1001(arc) gid=1001(arc) groups=1001(arc),10000(datawrite),10001(tron)

I am not sure what I am missing. I had another machine that was mounting this nfs mount and those users work just fine...(a caveat those users are all in the users and groups that own these files) but arc is in datawrite (canwrite) so not sure where the permission denied is coming from. If I goto my data vm that is exporting the NFS mount and su as the arc user I can write in the /images folder no issue.

Arc user can write to disk on the server itself:

[arc@35M_DATA images]$ touch somefile
[arc@35M_DATA images]$ ls -lh
-rwxrwxr-x.   1 tron datawrite 9.6K Apr  1  2021 q4list
-rw-rw-r--.   1 arc  datawrite 0 Nov  8 11:58 somefile     <---
drwxrwxr-x. 411 tron datawrite 12K Nov  7 17:15 tcam
drwxrwsr-x.   3 tron datawrite 4.0K Sep 29 18:12 tmp
drwxrwsr-x.   2 tron datawrite 4.0K Oct 25  2011 ytgold
[arc@35M_DATA images]$

[arc@35M_DATA images]$ ls -ld
drwxrwsr-x. 257 tron datawrite 12288 Nov  8 11:58 .

From client:

[arc@ecamera-icc images]$ ls -ld
drwxrwsr-x 257 500 datawrite 12288 Nov  8 09:58 .
[arc@ecamera-icc images]$

Codejoy

Asked: 2021-10-13 15:21:13 +0800 CST

script backups sqlite database, when ran as a cron the db and names are mangled

0

I have a crontab:

 * * * * * /home/ipa/web/backup.sh > /dev/null 2>&1

(No it doesn't run every minute just testing here)

The backup.sh has this:

#!/usr/bin/env sh



sqlite3 /home/ipa/web/ipa_django/mysite/db.sqlite3 ".backup 'backup_file.sqlite3'"
src="/home/ipa/web/backup_file.sqlite3"
let seconds=$(date +%H)*3600+$(date +%M)*60+$(date +%S)
echo $seconds
filename="db.sqlite3"
echo $filename.$seconds
dest="/home/ipa/web/db_backups/"$filename.$seconds
cp  $src $dest
cd /home/ipa/web/db_backups
tar -cvzf ipadbbackup.tar.gz $filename.$seconds
cd /home/ipa/web/
cp /home/ipa/web/db_backups/ipadbbackup.tar.gz ipadbbackup.tar.gz
rm /home/ipa/web/db_backups/$filename.$seconds
rm /home/ipa/web/db_backups/ipadbbackup.tar.gz
#rm "$srcfile"
/usr/bin/bash start-app.sh;
echo "Running email backup"
python2.7 backup_via_email.py
rm ipadbbackup.tar.gz

The idea is I copy the database to a scratch area, zip it up copy it to where another .py file can find it and email it off as a backup.

The problem is:

If I run this script from where it lives: /home/ipa/web/

with a ./backup.sh

It works great, I get the file in my email works great: db.sqlite3.77627

or what not... the problem is when it runs as a cron the file is not complete and the file name is:

db.sqlite3.

I cannot figure out what about it running as a cron is making it fail essentially? The file in the tar is also 2.1k smaller? So not sure what is going on... not even sure where to look.

Codejoy

Asked: 2021-06-12 09:33:03 +0800 CST

Let openldap users change password with passwd in centos, i broke it

1

Tried to do the above with this tutorial:

https://www.unixguide.net/content/openldap-allow-users-change-their-password-unix-passwd-command

So I created this ldif:

dn: olcDatabase={2}hdb,cn=config
changetype: modify
add: olcAccess
olcAccess: to attrs=userPassword by self write by anonymous auth by dn.base="cn=ldapadm,dc=bbb,dc=local" write by * none

add: olcAccess
olcAccess: to * by self write by dn.base="cn=ldapadm,dc=bbb,dc=local" write by * read

Ran the ldapmodify, now no user can log into any client with their password when they could before I ran the above modify.

now attempting to login says permission denied after correct password is entered....what did I break ?? (totally new to openldap)

And as it might be relevant this is how I got my clients connected to my openldap server:

yum install -y openldap-clients nss-pam-ldapd
authconfig --enableldap --enableldapauth --ldapserver=192.168.1.10 --ldapbasedn="dc=bbb,dc=local" --enablemkhomedir --update

Out of the box, if I type passwd on an ldap user...the resulting happens:

[ldapuser@sdss5-db ~]$ passwd
Changing password for user ldapuser.
(current) LDAP Password: 
New password: 
Retype new password: 
password change failed: Insufficient access
passwd: Authentication token manipulation error

Though again that ldif file above with the olcAccess broke my ldap didn't make anything work (had to revert the VM back to before I ran that command..mostly because I am new to ldap and don't know how to remove items etc)

Here are all my cn=config files:

olcDatabase={-1}frontend.ldif


# AUTO-GENERATED FILE - DO NOT EDIT!! Use ldapmodify.
# CRC32 daf543d1
dn: olcDatabase={-1}frontend
objectClass: olcDatabaseConfig
objectClass: olcFrontendConfig
olcDatabase: {-1}frontend
structuralObjectClass: olcDatabaseConfig
entryUUID: 1244881e-5cf7-103b-94a5-5f5943b4315f
creatorsName: cn=config
createTimestamp: 20210608224613Z
entryCSN: 20210608224613.408737Z#000000#000#000000
modifiersName: cn=config
modifyTimestamp: 20210608224613Z


olcDatabase={0}config.ldif

# AUTO-GENERATED FILE - DO NOT EDIT!! Use ldapmodify.
# CRC32 54d58ed2
dn: olcDatabase={0}config
objectClass: olcDatabaseConfig
olcDatabase: {0}config
olcAccess: {0}to * by dn.base="gidNumber=0+uidNumber=0,cn=peercred,cn=extern
 al,cn=auth" manage by * none
structuralObjectClass: olcDatabaseConfig
entryUUID: 12448a9e-5cf7-103b-94a6-5f5943b4315f
creatorsName: cn=config
createTimestamp: 20210608224613Z
entryCSN: 20210608224613.408801Z#000000#000#000000
modifiersName: cn=config
modifyTimestamp: 20210608224613Z



olcDatabase={1}monitor.ldif

# AUTO-GENERATED FILE - DO NOT EDIT!! Use ldapmodify.
# CRC32 3165478b
dn: olcDatabase={1}monitor
objectClass: olcDatabaseConfig
olcDatabase: {1}monitor
structuralObjectClass: olcDatabaseConfig
entryUUID: 12448d32-5cf7-103b-94a7-5f5943b4315f
creatorsName: cn=config
createTimestamp: 20210608224613Z
olcAccess: {0}to * by dn.base="gidNumber=0+uidNumber=0,cn=peercred,cn=extern
 al, cn=auth" read by dn.base="cn=ldapadm,dc=bbb,dc=local" read by * none
entryCSN: 20210608225001.645649Z#000000#000#000000
modifiersName: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
modifyTimestamp: 20210608225001Z




olcDatabase={2}hdb.ldif


# AUTO-GENERATED FILE - DO NOT EDIT!! Use ldapmodify.
# CRC32 89413e34
dn: olcDatabase={2}hdb
objectClass: olcDatabaseConfig
objectClass: olcHdbConfig
olcDatabase: {2}hdb
olcDbDirectory: /var/lib/ldap
olcDbIndex: objectClass eq,pres
olcDbIndex: ou,cn,mail,surname,givenname eq,pres,sub
structuralObjectClass: olcHdbConfig
entryUUID: 1244907a-5cf7-103b-94a8-5f5943b4315f
creatorsName: cn=config
createTimestamp: 20210608224613Z
olcSuffix: dc=bbb,dc=local
olcRootDN: cn=ldapadm,dc=bbb,dc=local
olcRootPW:: e1NTSEF9QTB0dS94UjR6cy83ZEMvQUxPL21uS2RLaXZUeFNXVEg=
olcAccess: {0}to attrs=userPassword by self write by anonymous auth by dn.ba
 se="cn=ldapadm,dc=bbb,dc=local" write by * none
entryCSN: 20210702202550.687485Z#000000#000#000000
modifiersName: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
modifyTimestamp: 20210702202550Z

It seems it is not writing the second portion of:

add: olcAccess
olcAccess: to * by self write by dn.base="cn=ldapadm,dc=unixguide,dc=net" write by * read

To the olcDatabase={2}hdb.ldif , as going by the example it has olcAccess: {1}to * by self write by dn.base="cn=ldapadm,dc=unixguide,dc=net" write by * read

I am guessing this is what is not working and nuking the ability to login after I run the command. I am not sure why it is not showing up though as I get no errors when I run the modify command with my ldif posted above...

The result from the ldap modify is this:

[root@openldapserver ~]# ldapmodify -Y EXTERNAL  -H ldapi:/// -f passwordaccess.ldif
SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
modifying entry "olcDatabase={2}hdb,cn=config"

Codejoy

Asked: 2021-06-12 08:59:13 +0800 CST

Trying to get sudoers working on openldap/centos7

0

I was following this tutorial here:

https://kifarunix.com/how-to-configure-sudo-via-openldap-server/

A lot of it made sense, but still new to openldap so some of this is cryptic too. I have the openldap running with users authenticating on other machines even working with phpldapadmin. So it was time to get sudoers working for some users. I ran the sudoers2ldif command and got a file similar to what was showed in the tutorial, and edited it accordingly. When It came time to run ldapadd -Y EXTERNAL -H ldapi:/// -f sudoers_modified.ldif it failed with the error:

SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
adding new entry "cn=defaults,ou=SUDOers,dc=apo,dc=local"
ldap_add: Invalid syntax (21)
    additional info: objectClass: value #1 invalid per syntax

Is the 21, the line number of the .ldif file? Or some other error code...also no idea what is invalid on the objectClass command... posted is the ldif file below.

dn: cn=defaults,ou=SUDOers,dc=bbb,dc=local
objectClass: top
objectClass: sudoRole
cn: defaults
description: Default sudoOption's go here
sudoOption: !visiblepw
sudoOption: always_set_home
sudoOption: match_group_by_gid
sudoOption: always_query_group_plugin
sudoOption: env_reset
sudoOption: env_keep =  "COLORS DISPLAY HOSTNAME HISTSIZE KDEDIR LS_COLORS"
sudoOption: env_keep += "MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE"
sudoOption: env_keep += "LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES"
sudoOption: env_keep += "LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE"
sudoOption: env_keep += "LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY"
sudoOption: secure_path = /sbin:/bin:/usr/sbin:/usr/bin

dn: cn=sudo,OU=SUDOers,dc=bbb,dc=local
objectClass: top
objectClass: sudoRole
cn: sudo
sudoUser: bobby
sudoHost: ALL
sudoRunAsUser: ALL
sudoCommand: ALL

Maybe sudoRole needs to be added somehow? The other ldif I added successfully for this was:

dn: ou=SUDOers,dc=bbb,dc=local
objectCLass: top
objectClass: organizationalUnit
ou: SUDOers
description: BBB SUDOers container

I had found another tutorial here:

https://forums.centos.org/viewtopic.php?t=73807

With similar information slightly different, I didn't use this one because one of the ldif files that was posted had a ton of stuff that said it was 'autogenerated' and I had no idea how or where it came from.

After the one answer, I believe the file showed on the above link that has the data:

vi /testfolder/sudoers.ldif
#------------------------
# AUTO-GENERATED FILE - DO NOT EDIT!! Use ldapmodify.
# CRC32 b181185c
dn: cn=sudoers,cn=schema,cn=config
objectClass: olcSchemaConfig
cn: sudoers
olcAttributeTypes: {0}( 1.3.6.1.4.1.15953.9.1.1 NAME 'sudoUser' DESC 'User(s
) who may run sudo' EQUALITY caseExactIA5Match SUBSTR caseExactIA5Substrin
gsMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.26 )
olcAttributeTypes: {1}( 1.3.6.1.4.1.15953.9.1.2 NAME 'sudoHost' DESC 'Host(s
) who may run sudo' EQUALITY caseExactIA5Match SUBSTR caseExactIA5Substring
sMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.26 )
olcAttributeTypes: {2}( 1.3.6.1.4.1.15953.9.1.3 NAME 'sudoCommand' DESC 'Com
mand(s) to be executed by sudo' EQUALITY caseExactIA5Match SYNTAX 1.3.6.1.4
.1.1466.115.121.1.26 )
olcAttributeTypes: {3}( 1.3.6.1.4.1.15953.9.1.4 NAME 'sudoRunAs' DESC 'User(
s) impersonated by sudo (deprecated)' EQUALITY caseExactIA5Match SYNTAX 1.3
.6.1.4.1.1466.115.121.1.26 )
olcAttributeTypes: {4}( 1.3.6.1.4.1.15953.9.1.5 NAME 'sudoOption' DESC 'Opti
ons(s) followed by sudo' EQUALITY caseExactIA5Match SYNTAX 1.3.6.1.4.1.1466
.115.121.1.26 )
olcAttributeTypes: {5}( 1.3.6.1.4.1.15953.9.1.6 NAME 'sudoRunAsUser' DESC 'U
ser(s) impersonated by sudo' EQUALITY caseExactIA5Match SYNTAX 1.3.6.1.4.1.
1466.115.121.1.26 )
olcAttributeTypes: {6}( 1.3.6.1.4.1.15953.9.1.7 NAME 'sudoRunAsGroup' DESC '
Group(s) impersonated by sudo' EQUALITY caseExactIA5Match SYNTAX 1.3.6.1.4.
1.1466.115.121.1.26 )
olcAttributeTypes: {7}( 1.3.6.1.4.1.15953.9.1.8 NAME 'sudoNotBefore' DESC 'S
tart of time interval for which the entry is valid' EQUALITY generalizedTim
eMatch ORDERING generalizedTimeOrderingMatch SYNTAX 1.3.6.1.4.1.1466.115.12
1.1.24 )
olcAttributeTypes: {8}( 1.3.6.1.4.1.15953.9.1.9 NAME 'sudoNotAfter' DESC 'En
d of time interval for which the entry is valid' EQUALITY generalizedTimeMa
tch ORDERING generalizedTimeOrderingMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1
.24 )
olcAttributeTypes: {9}( 1.3.6.1.4.1.15953.9.1.10 NAME 'sudoOrder' DESC 'an i
nteger to order the sudoRole entries' EQUALITY integerMatch ORDERING intege
rOrderingMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.27 )
olcObjectClasses: {0}( 1.3.6.1.4.1.15953.9.2.1 NAME 'sudoRole' DESC 'Sudoer
Entries' SUP top STRUCTURAL MUST cn MAY ( sudoUser $ sudoHost $ sudoCommand
$ sudoRunAs $ sudoRunAsUser $ sudoRunAsGroup $ sudoOption $ sudoOrder $ su
doNotBefore $ sudoNotAfter $ description ) )

Once I realized the file was the schema I added it and finally got this all working so in a round about way I accepted the answer even though I had to dig into what I was doing a bit more. Btw the schema file wasn't used in my ldap I had to add it via the ldapadd

Codejoy

Asked: 2021-06-10 10:35:28 +0800 CST

Openldap and nfserver, both work although /home/user cannot be created unless I log into the nfserver first with new ldapusers

0

I have an openldap server I set up on cent os 7. I blended it to work with all my other VMs that mount a nfs mount from a nfs server for their /home.

I just figured out that if I create a new ldap user, and try to log into some VM it lets me login but states how it cannot create /home/user and is unable to chngdir to it.

But I also learned if I first ssh user@mynfsserver It logs in, creates the appropriate /home/user and then after that I can ssh to any other VM with my ldapuser and it works just fine no longer complains about being able to not create the folder in home for said user.

I use autofs on each VM with a home.map file, it looks to have the right permissions:

* -fstype=nfs,rw,nosuid,soft 10.10.1.139:/home/&

so this feels like some sort of permission issue with users getting errors logging into a VM with their newly created ldap credentials. But if that same user logs into the 10.10.1.139 (nfs server where home is mapped from), then it seems to let them log into the VMs with no unable to create /home/user errors anymore.

Does my openldap server have to be made aware of the nfs server somehow?

Aside from the hiccup of having to log into the nfs server first, I can goto another VM touch a file in that home folder and bingo it is on any other VM I log into. So it is like 95% working, just annoying to have to first log into nfserver with ldap user to make the /home/user creation work on other VMs first.

Codejoy

Asked: 2021-03-12 11:17:30 +0800 CST

simple systemd service and socket failing

0

I had a service I was trying to run this way but it was slightly a large python program. I took a step back and built a dead simple python program to see if I can get it to run. It fails when I try to connect via telnet to this socket running. Below are the .socket, .service and .py files....

testPy.socket

[Unit]
Description=Socket to TESTPY for connection
PartOf=testPy.service

[Socket]
ListenStream=30001

[Install]
WantedBy=sockets.target

testPy.service

[Unit]
Description=TEST PY
After=network.target testPy.socket
Requires=testPy.socket
[Service]
ExecStart=/home/workers/miniconda2/bin/python /home/workers/testPy.py
StandardInput=socket
[Install]
WantedBy=default.target

testPy.py

import sys

END_OF_LINE = '\r\n'
while(1):
        input = sys.stdin.readline()
        buffer = input.strip()
        if not buffer:
                sys.stdout.write("OKAY DUDE")
                sys.stdout.flush()
                continue
        if buffer in ['quit', 'QUIT']:
                break
        sys.stdout.write('\n' + buffer + END_OF_LINE)
        sys.stdout.flush()

now if I run this in a command line, it runs fine. I can type quit and it exits out of the loop,echos anything back..

If I say:

systemctl start testPy.socket

and then type:

telnet localhost 30001

it connects a bit then drops it. Then various statuses are (to me ) non descriptive:

systemctl status testPy.socket

● testPy.socket - Socket to TESTPY for connection
   Loaded: loaded (/etc/systemd/system/testPy.socket; disabled; vendor preset: disabled)
   Active: failed (Result: service-failed-permanent) since Thu 2021-03-11 13:59:54 EST; 11min ago
   Listen: [::]:30001 (Stream)

Mar 11 13:59:42 dhcp-093.apo.nmsu.edu systemd[1]: Listening on Socket to TESTPY for connection.
Mar 11 13:59:54 dhcp-093.apo.nmsu.edu systemd[1]: Unit testPy.socket entered failed state.

systemctl status testPy.service

● testPy.service - TEST PY
   Loaded: loaded (/etc/systemd/system/testPy.service; disabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Thu 2021-03-11 13:59:54 EST; 12min ago
  Process: 2087 ExecStart=/home/workers/miniconda2/bin/python /home/workers/testPy.py (code=exited, status=1/FAILURE)
 Main PID: 2087 (code=exited, status=1/FAILURE)

Mar 11 13:59:54 dhcp-093.apo.nmsu.edu systemd[1]: Started TEST PY.
Mar 11 13:59:54 dhcp-093.apo.nmsu.edu systemd[1]: testPy.service: main process exited, code=exited, status=1/FAILURE
Mar 11 13:59:54 dhcp-093.apo.nmsu.edu systemd[1]: Unit testPy.service entered failed state.
Mar 11 13:59:54 dhcp-093.apo.nmsu.edu systemd[1]: testPy.service failed.
Mar 11 13:59:54 dhcp-093.apo.nmsu.edu systemd[1]: start request repeated too quickly for testPy.service
Mar 11 13:59:54 dhcp-093.apo.nmsu.edu systemd[1]: Failed to start TEST PY.
Mar 11 13:59:54 dhcp-093.apo.nmsu.edu systemd[1]: testPy.service failed.

I believe if I can get this simple test to work, I can get the larger .py file I need to run as it works essentially the same. I have a service and socket built for that, with generally the same errors. Though the systemctl status kosmos.service gives a failed still but says the main PID status=0/success so that is odd.

It says a start limit is the fail, but if the service as simple as the one here has to start and start and start that means something else is wrong, guessing a config in my socket or service file but not sure what. I was hoping I could have my python not change at all from listening sys.stdin.readline etc, and just the lines it read were from a connection made on that port (30001) from another machine. I thought that is what all this socket stuff does just this (all this came about because it used to run on an older machine with xinetd)

Codejoy

Asked: 2021-03-05 16:58:44 +0800 CST

Simple code running in systemd exits to fast?

0

I had this old code base running in xinet d. It was basically a simple python script that waited for standard in to come in on a socket, it would do something with it and respond on standard out. So I heard xinetd is no longer used, so I tried to convert this to systemd. I can get all this setup and do a systemctl start kosmos (the service name)but it exits instantly. Was curious what if anything I might be missing in the setup to cause this. I am unsure as the code it is pretty simple forever loop that listens to that standard in so not sure why it exits on systemd.

kosmos.service

[Unit]
Description=Kosmos Guide Camera ICC
After=network.target kosmos.socket
Requires=kosmos.socket
[Service]
ExecStart=/home/workers/miniconda2/bin/python /home/workers/kosmosICC/kcamera/kcamera.py
StandardInput=socket
[Install]
WantedBy=default.target

Kosmos.socket

[Unit]
Description=Socket to KOSMOS for connection
PartOf=kosmos.service

[Socket]
ListenStream=127.0.0.1:30001

[Install]
WantedBy=sockets.target

So when I do a systemctl start kosmos I get no error and yet a systemctl status kosmos shows:

● kosmos.service - Kosmos Guide Camera ICC
   Loaded: loaded (/etc/systemd/system/kosmos.service; disabled; vendor preset: disabled)
   Active: inactive (dead) since Thu 2021-03-04 17:58:43 EST; 14s ago
  Process: 5754 ExecStart=/home/workers/miniconda2/bin/python /home/workers/kosmosICC/kcamera/kcamera.py (code=exited, status=0/SUCCESS)
 Main PID: 5754 (code=exited, status=0/SUCCESS)

Mar 04 17:58:42 dhcp-093.apo.nmsu.edu systemd[1]: Started Kosmos Guide Camera ICC.

I didn't see where systemd might put logs in /var/log but didn't see anything that looked like it would give a clue as to what is going on (there was no kosmos.service.log or anything). I am probably trying to find the logs to see if systemd sees the code bomb out (running by hand the code runs fine)

This is the main loop snippet of the code trying to be ran:

def run():
    '''
    run for ever listening for standard in
        all commands are echoed, whatever reply output, and then
        an OK.
        '''
        
        while 1:
            input = sys.stdin.readline()


try:
        run()
except:
        print format_exc()

(of course more methods etc) but this is the main one. I suspect the except might be getting called but not sure how to see print format_exec() when running from that systemd service.

EDIT:

Thanks to a comment suggestion I did run the socket and then it ran a bit, I could telnet to the socket but then sending any data made it died, though it doesn't say why:

journalctl -u kosmos.service -b   

Mar 08 15:53:19 dhcp-093.apo.nmsu.edu systemd[1]: Started Kosmos Guide Camera ICC.
Mar 08 15:53:20 dhcp-093.apo.nmsu.edu systemd[1]: Started Kosmos Guide Camera ICC.
Mar 08 15:53:20 dhcp-093.apo.nmsu.edu systemd[1]: Started Kosmos Guide Camera ICC.
Mar 08 15:53:21 dhcp-093.apo.nmsu.edu systemd[1]: Started Kosmos Guide Camera ICC.
Mar 08 15:53:21 dhcp-093.apo.nmsu.edu systemd[1]: Started Kosmos Guide Camera ICC.
Mar 08 15:53:22 dhcp-093.apo.nmsu.edu systemd[1]: start request repeated too quickly for kosmos.service
Mar 08 15:53:22 dhcp-093.apo.nmsu.edu systemd[1]: Failed to start Kosmos Guide Camera ICC.
Mar 08 15:53:22 dhcp-093.apo.nmsu.edu systemd[1]: Unit kosmos.service entered failed state.
Mar 08 15:53:22 dhcp-093.apo.nmsu.edu systemd[1]: kosmos.service failed.

Does my Service need a Type? Or my socket?

Anywhere else to see in a log make sure it isn't my python code bombing out (Even though again it runs fine as a python program)

Codejoy

Asked: 2021-02-16 13:08:16 +0800 CST

Installed xinetd, started it but said 'removing' on several services including one I configured in /etc/xinetd.d/

1

I am new to xinetd, but trying to mimic an old machine on site that is using it. So I copied the configs on that machine (changing names where appropriate) and then tried to start xinetd. Then realizd, it is not installed on my fresh centos7 install. So I yum installed. Then systemctl enable xinetd then I did a systemctl start xinetd and then a systtem status xinetd which is what makes my brain hurt, it shows it is removing my service (kcamera) but I have no idea why or why. Then a sudo lsof -i -P -n | grep LISTEN shows no xinetd running at all.

Curious what I am missing. (I haven't touched the firewall if that makes a difference).

[root@dhcp-093 etc]# systemctl status xinetd
● xinetd.service - Xinetd A Powerful Replacement For Inetd
   Loaded: loaded (/usr/lib/systemd/system/xinetd.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2021-02-15 15:19:37 EST; 45min ago
  Process: 12125 ExecStart=/usr/sbin/xinetd -stayalive -pidfile /var/run/xinetd.pid $EXTRAOPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 12126 (xinetd)
   CGroup: /system.slice/xinetd.service
           └─12126 /usr/sbin/xinetd -stayalive -pidfile /var/run/xinetd.pid

Feb 15 15:19:37 dhcp-093.apo.nmsu.edu xinetd[12126]: removing discard
Feb 15 15:19:37 dhcp-093.apo.nmsu.edu xinetd[12126]: removing discard
Feb 15 15:19:37 dhcp-093.apo.nmsu.edu xinetd[12126]: removing echo
Feb 15 15:19:37 dhcp-093.apo.nmsu.edu xinetd[12126]: removing echo
Feb 15 15:19:37 dhcp-093.apo.nmsu.edu xinetd[12126]: removing kcamera
Feb 15 15:19:37 dhcp-093.apo.nmsu.edu xinetd[12126]: removing tcpmux
Feb 15 15:19:37 dhcp-093.apo.nmsu.edu xinetd[12126]: removing time
Feb 15 15:19:37 dhcp-093.apo.nmsu.edu xinetd[12126]: removing time
Feb 15 15:19:37 dhcp-093.apo.nmsu.edu xinetd[12126]: xinetd Version 2.3.15 started with libwrap loadavg labeled-networking options compiled in.
Feb 15 15:19:37 dhcp-093.apo.nmsu.edu xinetd[12126]: Started working: 0 available services

I had possibly wrongly assumed that just adding my kcamera to /etc/xinetd.d was enough to get things rolling once xinetd started. Though a LS in that folder reveals a lot of files like tcpmux-server which is one of the ones above it said it was 'removing'.

not sure what else to try, look for, or configure.

xinetd.conf

#
# This is the master xinetd configuration file. Settings in the
# default section will be inherited by all service configurations
# unless explicitly overridden in the service configuration. See
# xinetd.conf in the man pages for a more detailed explanation of
# these attributes.

defaults
{
# The next two items are intended to be a quick access place to
# temporarily enable or disable services.
#
#       enabled         =
#       disabled        =
# Define general logging characteristics.
        log_type        = SYSLOG daemon info
        log_on_failure  = HOST
        log_on_success  = PID HOST DURATION EXIT

# Define access restriction defaults
#
#       no_access       =
#       only_from       =
#       max_load        = 0
        cps             = 50 10
        instances       = 50
        per_source      = 10

# Address and networking defaults
#
#       bind            =
#       mdns            = yes
        v6only          = no

# setup environmental attributes
# setup environmental attributes
#
#       passenv         =
        groups          = yes
        umask           = 002

# Generally, banners are not used. This sets up their global defaults
#
#       banner          =
#       banner_fail     =
#       banner_success  =
}

includedir /etc/xinetd.d

/etc/xinetd.d/kcamera

service kcamera
{
    disable     = no
    socket_type = stream
    protocol    = tcp
    wait        = no
    user        = arc
    group       = datawrite
    server      = /home/workers/kosmosICC/kcamera/kcamerad
    groups      = yes
    flags       = REUSE
    passenv     =
    umask       = 0002
    log_on_failure  += USERID
    log_on_success  += PID HOST EXIT
}

a line in /etc/services:

kcamera         30001/tcp   # kosmos camera

Codejoy

Asked: 2021-02-05 15:29:22 +0800 CST

exportfs with rw not working , fstab either. Mount point still read only on my setup machine

1

I have a VM that other VMs and servers mount from. This vm has an ip of say:

10.10.1.1

The folder that others mount is \export\images

So one VM mounts this in its fstab:

10.10.1.1:/export/images /export/images nfs rsize=32768,wsize=32768,actimeo=0,bg,intr

Great anything that has this in their FSTAB (os's I didn't build) can read and write to this directory all day!

A new PC I set up couldn't so I thought I also had to edit the 'servers' VM /etc/exports(which is sim linked for some reason? I didn't set this machine up)

It has the right line:

/export/images 10.22.1.93(rw)

That 10.22.1.93 is a server I set up. Which has the fstab entry:

10.10.1.1:/export/images /export/images nfs rsize=32768,wsize=32768,actimeo=0,bg,intr

This server I set up, if I do a mount -a it mounts everything from FSTAB and I goto /export/images and then try to touch a file and it says cannot 'Read-Only file system'.

I even unmount and do a mount -a as root again.

On the 'server' I did exportfs -a before trying all this. Still no go, do I have to reset my newly setup PC to get it to take? As far as I can tell this 'should be' letting my new machine read/write to the mount from its fstab .

Codejoy

Asked: 2020-11-25 13:33:44 +0800 CST

Cannot virsh start my VM after a reboot of physical machine..error only a single IDE controller is supported?

0

I have a VM running under kvm (it is a .img file), the server it runs on crashed hard and restarted and finally got things up... I think I goofed and ran

yum install -y qemu-kvm

Which I am sure updated a ton of stuff on a very old unpdated os. I had issues after this machine died having it see vms were there like the KVM itself (the image files were there but they were not 'registered' anywhere). Not sure how I got that back but all the VMS did virsh start but one...it gives an error:

[root@sdss4-server1 ~]# virsh start sdss-host2
setlocale: No such file or directory
error: Failed to start domain sdss-host2
error: unsupported configuration: Only a single IDE controller is supported for this machine type

Did my file get corrupted? Can I repair it? I would LOVE to get this VM running again as there was no backup and some of the data needs to get off it. (I thought I could mount .img files in linux somehow? but I tried the following)

kpartx -av sdsshost2.img
add map loop0p1 (253:2): 0 208782 linear /dev/loop0 63
add map loop0p2 (253:3): 0 10490445 linear /dev/loop0 208845
device-mapper: resume ioctl on loop0p3  failed: Invalid argument
create/reload failed on loop0p3
add map loop0p3 (0:0): 0 62916711 linear /dev/loop0 10699290


[root@sdss4-server1 vm-cache]# sudo mount /dev/mapper/loop0p2 /mnt/host2
mount: unknown filesystem type 'swap'

[root@sdss4-server1 vm-cache]# sudo mount /dev/mapper/loop0p1 /mnt/host2

(I think this is a boot and the files in it are:)


System.map                      initrd-2.4.21-27.0.2.EL.img             vmlinux-2.4.21-27.0.2.ELsmp
System.map-2.4.21-27.0.2.EL     initrd-2.4.21-27.0.2.ELsmp.img          vmlinux-2.4.21-32.0.1.EL
System.map-2.4.21-27.0.2.ELsmp  initrd-2.4.21-32.0.1.EL.img             vmlinux-2.4.21-32.0.1.ELsmp
System.map-2.4.21-32.0.1.EL     initrd-2.4.21-32.0.1.ELsmp.3w-9xxx.img  vmlinuz-2.4.21-27.0.2.EL
System.map-2.4.21-32.0.1.ELsmp  initrd-2.4.21-32.0.1.ELsmp.img          vmlinuz-2.4.21-27.0.2.ELsmp
config-2.4.21-27.0.2.EL         initrd-2.4.21-52.ELBOOT.img             vmlinuz-2.4.21-32.0.1.EL
config-2.4.21-27.0.2.ELsmp      kernel.h                                vmlinuz-2.4.21-32.0.1.ELsmp
config-2.4.21-32.0.1.EL         message                                 vmlinuz-2.4.21-52.ELBOOT
config-2.4.21-32.0.1.ELsmp      message.ja
grub                            vmlinux-2.4.21-27.0.2.EL

Certainly not what I was looking for, it appears the loop0p3 is what I want to mount but the kpartx gives an error on that I do not understand fully.

So is my virtual drive/disk bad? Anything I can do to recover this some how?

The hosts definition file:

<domain type='kvm'>
  <name>sdss-host2</name>
  <uuid>36637cc5-63a1-4485-9a41-31afafb352dd</uuid>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  <vcpu placement='static' cpuset='21'>1</vcpu>
  <cputune>
    <emulatorpin cpuset='21'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>coreduo</model>
    <vendor>Intel</vendor>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/vm-cache/sdsshost2.img'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/sdi'/>
      <target dev='hdb' bus='ide'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='piix3-uhci'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='ide' index='1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <interface type='direct'>
      <mac address='52:54:00:2d:d1:ee'/>
      <source dev='enp4s0f0' mode='vepa'/>
      <model type='rtl8139'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' listen='127.0.0.1' keymap='en-us'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <video>
      <model type='vga' vram='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Codejoy

Asked: 2020-10-14 16:06:38 +0800 CST

I had some drbd (i guess block devices?) on my server, after a crash they are all gone

1

I am not a sys admin but inherited some servers setup with no documentation in linux. Today the server died in a way it was unresponsive and the VMs running on it down...after a good few hours the server reboot itself, so could ssh to it again but realized what used to show up in /dev as

/dev/drbd0 /dev/drbd1 etc etc

Are no longer there at all...I am guessing a drive or a series of drives went kaput. A command

cli64 vsf info

Shows that my areca disk array is checking three (volumes? block devices? thingies?) and doing it slowwwlllyyyy...

  # Name             Raid Name       Level   Capacity Ch/Id/Lun  State
===============================================================================
  1 ARC-1883-VOL#000 vm-cache        Raid3    300.0GB 00/00/00   Normal
  2 ARC-1883-VOL#001 data            Raid6   12000.0GB 00/00/01   Checking(50.4%)
  3 ARC-1883-VOL#002 apogee          Raid6   9000.0GB 00/00/02   Checking(50.6%)
  4 ARC-1883-VOL#004 database        Raid1+0 3000.0GB 00/00/03   Normal
  5 ARC-1883-VOL#005 system          Raid1+0 3000.0GB 00/00/04   Normal
  6 ARC-1883-VOL#006 archive         Raid6   6000.0GB 00/00/05   Checking(74.3%)
  7 VM-Cache Backup  VM-Cache Backup Raid1+0 2000.0GB 00/00/06   Normal
  8 VS Apogee Backup RS Apogee BackupRaid0   3000.0GB 00/00/07   Normal
  9 ARC-1883-VOL#008 TPM             Raid1+0 1500.0GB 00/01/00   Normal
 10 SDSS-BACKUP-VOLU SDSS-BACKUP-RAIDRaid0   1000.0GB 00/01/01   Normal
===============================================================================
GuiErrMsg<0x00>: Success.

It is my hope once the checks are done I will once again see the /dev/drbd folders so I can mount them and get my VM Image files off of them .... though I think that is wishful thinking. I am not sure what else to poke around to find and try to have it where drbd once again exists in my /dev directory.

Normally the command to get the VMs setup and ready to use is a :

drbdadm primary --force all

mount -o noatime /dev/drbd/by-res/vm-cache /vm-cache

then lo and behold the /vm-cache has all the .img files..... Though with the /dev/drbd missing, this mount is of course failling.

Codejoy

Asked: 2020-06-23 11:32:01 +0800 CST

postfix smtp slow to send from clients (and sometimes does not authenticate), works fine with 'squirrelmail'?

0

I inherited a long time working postfix with courier imap box that runs run of the mill smtp. It seems to use fail2ban for firewall protection of some sort (never used fail2ban) and according to /var/log/messages we are getting a ton of attempts on our box, which is probably normal.

htop on the mail machine looks fine except for maybe memory 2.53 out 6ish being used. Seems kinda high. Regardless from a client now clicking send on your email the email takes many minutes to finally 'send'. A lot of users also got signed out of their clients (i did on my outlook app on my android). I removed the account off phone completely and tried to add it, and it will not connect to the outgoing mail server even though I know for a fact the password is right.

(I could see in /var/log/maillog entries like this)

2020-06-22 09:45:45.459 xmail postfix/smtpd[2592]: connect from c-98-230-220-31.hsd1.nm.comcast.net[98.230.224.38]

2020-06-22 09:48:39.527 xmail imapd: Connection, ip=[::ffff:98.230.224.38]

2020-06-22 09:48:40.006 xmail imapd: LOGIN, [email protected], ip=[::ffff:98.230.224.38], port=[50411], protocol=IMAP

2020-06-22 09:48:55.932 xmail postfix/smtpd[2592]: lost connection after AUTH from c-98-230-220-31.hsd1.nm.comcast.net[98.230.224.38]

...is what i see when I try to get my mobile to connect. (on the mobile side it just fails and says timed out)

I am not sure where to start looking for the cause of the slowdown, and the cause that some clients cannot connect or got disconnected and cannot reconnect.

To me all this feels like a certification issue some where (on server or client) but not sure where to look or how to check that theory. We do have a cronjob that weekly does a cert-bot auto renew and then does a change directory to /etc/letsencrypt/live/xmail.... and copies over some privkey.pem, cert.pem and fullchain.pem all into a courrier.pem. Then it says courier-imap-ssl restart and pop3d-ssl restart

I have looked through logs it doesn't tell me any errors but does say a lot of the like from above in the maillog (About connections being refused or dropped etc)

When using squirrel mail, there is no issues connecting or sending mail, it works like a charm.

All of these issues sprung up about a week to a week and a half ago it seems, again things have worked well for years it seems.

centos box courier imap postfix/smtp

Also got a new message in my inbox when i tried to send an email from my client:

Your message did not reach some or all of the intended recipients.

  Subject:  postfix issue
  Sent: 6/22/2020 1:31 PM

The following recipient(s) cannot be reached:

  '[email protected]' on 6/22/2020 1:35 PM
        Server error: '451 4.3.0 <[email protected]>: Temporary lookup failure'

Is this a DNS issue with our dns server maybe?

Shane p.s. I just did a simple settup from my WSL install of Ubuntu of Mutt. Configured it I think correctly and now hitting send always says:

Could not connect to mysmtp.blah.com (Resource temporarily unavailable) .

So maybe explains why clients like thunderbird etc take forever to send an email out? but no idea what would all of a sudden cause this slowness (restarted the VM imap/postfix run on also several times)

Some log entries from /var/log/maillog | grep error

2020-06-23 06:47:26.253 xmail amavis[7427]: (07427-01-7) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:47:41.895 xmail amavis[7111]: (07111-02-3) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:47:57.085 xmail amavis[7427]: (07427-01-8) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:48:11.329 xmail amavis[7111]: (07111-02-4) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:48:27.362 xmail amavis[7427]: (07427-01-9) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:48:48.961 xmail amavis[7111]: (07111-02-5) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:48:56.723 xmail amavis[7427]: (07427-01-10) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:49:21.196 xmail amavis[7111]: (07111-02-6) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:49:29.393 xmail amavis[7427]: (07427-02) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:49:51.207 xmail amavis[7111]: (07111-02-7) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:50:00.136 xmail amavis[7427]: (07427-02-2) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:50:29.001 xmail amavis[7111]: (07111-02-8) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:50:31.521 xmail amavis[7427]: (07427-02-3) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n
2020-06-23 06:51:05.472 xmail amavis[7111]: (07111-02-9) (!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamd.amavisd/clamd.sock (All attempts (1) failed connecting to /var/run/clamd.amavisd/clamd.sock) at (eval 132) line 659.\n

also a tail of /var/log/maillog (squirrel mail has now been taking a while to send out going messages too.

2020-06-23 07:04:35.879 xmail imapd: LOGIN FAILED, [email protected], ip=[::ffff:127.0.0.1]
2020-06-23 07:04:36.690 xmail imapd: LOGIN FAILED, [email protected], ip=[::ffff:127.0.0.1]
2020-06-23 07:04:36.977 xmail imapd: Disconnected, ip=[::ffff:127.0.0.1], time=7
2020-06-23 07:04:36.988 xmail postfix/smtpd[5865]: warning: unknown[46.38.148.2]: SASL LOGIN authentication failed: authentication failure
2020-06-23 07:04:36.989 xmail postfix/smtpd[5865]: disconnect from unknown[46.38.148.2]
2020-06-23 07:04:36.990 xmail postfix/smtpd[5865]: connect from unknown[46.38.148.10]
2020-06-23 07:04:36.990 xmail postfix/smtpd[5865]: disconnect from unknown[46.38.148.10]
2020-06-23 07:04:36.990 xmail postfix/smtpd[5865]: connect from unknown[46.38.145.6]
2020-06-23 07:04:36.995 xmail imapd: Connection, ip=[::ffff:127.0.0.1]

Btw this @nmsu.edu we have seen a ton of tries on, it is almost like someone is taking a dictionary of names and concatenating @nmsu.edu and seeing what sticks? we have fail2ban running some how on this server (I am learning more about it constantly through this)

Codejoy

Asked: 2020-04-22 20:55:59 +0800 CST

I tried postfix restart, said service isn't running. Yet I am on my working mail server? (cent os)

0

So I am not sure what I am doing wrong. This all started with the need to let others from outside our network access the SMTP to send email from offsite.

So I had in /etc/postfix a check_clients file, for a test I added my external IP address of my house.

Then I ran:

postmap check_clients to make/update the client.db I suspect I have to update or restart postfix to get this to take? But not sure how since it claims it is not running. I inherited this email server and really don't know what is going on with it but trying to come up to speed fast. It looks like it is running courier-imap.

A ps aux | grep postfix givs nothing a ps aux | grep smtp gives:

postfix  23310  0.0  0.1 192172  7376 ?        S    04:36   0:00 smtpd -n smtp -t inet -u -o stress= -o smtpd_sasl_auth_enable=yes -o receive_override_options=no_address_mappings -o content_filter=smtp-amavis:127.0.0.1:10024
postfix  23330  0.0  0.0  92392  4840 ?        S    04:36   0:00 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 -o smtp_send_xforward_command=yes -o disable_dns_lookups=yes
postfix  23334  0.0  0.1 174872  6888 ?        S    04:36   0:00 smtpd -n 127.0.0.1:10025 -t inet -u -o content_filter= -o local_recipient_maps= -o relay_recipient_maps= -o smtpd_restriction_classes= -o smtpd_client_restrictions= -o smtpd_helo_restrictions= -o smtpd_sender_restrictions= -o smtpd_recipient_restrictions=permit_mynetworks,reject -o mynetworks=127.0.0.0/8 -o strict_rfc821_envelopes=yes -o smtpd_error_sleep_time=0 -o smtpd_soft_error_limit=1001 -o smtpd_hard_error_limit=1000
postfix  23388  0.0  0.1 192172  7380 ?        S    04:37   0:00 smtpd -n smtp -t inet -u -o stress= -o smtpd_sasl_auth_enable=yes -o receive_override_options=no_address_mappings -o content_filter=smtp-amavis:127.0.0.1:10024
postfix  24037  0.0  0.1 118332  6812 ?        S    04:49   0:00 smtpd -n smtps -t inet -u -o stress= -o smtpd_tls_wrappermode=yes -o smtpd_sasl_auth_enable=yes
postfix  24045  0.0  0.0  92392  4840 ?        S    04:49   0:00 smtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 -o smtp_send_xforward_command=yes -o disable_dns_lookups=yes
postfix  24111  0.0  0.1 100880  6012 ?        S    04:50   0:00 smtpd -n 127.0.0.1:10025 -t inet -u -o content_filter= -o local_recipient_maps= -o relay_recipient_maps= -o smtpd_restriction_classes= -o smtpd_client_restrictions= -o smtpd_helo_restrictions= -o smtpd_sender_restrictions= -o smtpd_recipient_restrictions=permit_mynetworks,reject -o mynetworks=127.0.0.0/8 -o strict_rfc821_envelopes=yes -o smtpd_error_sleep_time=0 -o smtpd_soft_error_limit=1001 -o smtpd_hard_error_limit=1000
root     24131  0.0  0.0 112788   684 pts/0    R+   04:51   0:00 grep --color=auto smtp

I am not sure why I cannot send an email from my mac which when I am on site, sends fine. Here gives a time out on the SMTP connection. I thought it was in that check_clients file not having my IP address for at home. But I feel I could also be way off as for some odd reason I can send/receive email from my outlook on my android no matter what network I am on?

I am 99.9% sure I have the right connection settings in my applemail app too. Not sure what else to look at or to try. Some users claim they can send from their mac mail or thunderbird etc when offsite, and others cannot. Hence why I think this article:

http://www.postfix.org/RESTRICTION_CLASS_README.html

To try to change the smtp_recipient_restrictions and add my ip address too, just not sure how to get whatever mail server I am running to pick that up since postfix reload failed as it claims it is not running :/

Codejoy

Asked: 2019-09-04 08:36:09 +0800 CST

apache does not see my new certs, still has expired certs

0

In typing this question I found this:

Apache seems to be using old expired certificate even though new one is installed

His issue is mine to a T, and all things he tried more or less I did too. The difference was his was solved because he had nginx running. In my case I have no such reverse proxy server. So I just cannot get Apache to see the new certs I got using certbot (that was a whole other issue, certbot auto renew didn't work gave errors and so I did a certbot cert only apache and pointed appach ssl-certs in etc/httpd/extra to there.

Tried everything else like he did. Moved the folder the /etc/httpd/extra/ssl-certs and ssl-certs-proxy were pointing to to /tmp, and had those files point to the new .pem location:

SSLCertificateFile /etc/letsencrypt/live/www.apo.nmsu.edu/cert.pem
SSLCertificateKeyFile /etc/letsencrypt/live/www.apo.nmsu.edu/privkey.pem
SSLCertificateChainFile /etc/letsencrypt/live/www.apo.nmsu.edu/chain.pem
Include /etc/letsencrypt/options-ssl-apache.conf

cert.pem -> ../../archive/www.apo.nmsu.edu/cert2.pem
chain.pem -> ../../archive/www.apo.nmsu.edu/chain2.pem
fullchain.pem -> ../../archive/www.apo.nmsu.edu/fullchain2.pem
privkey.pem -> ../../archive/www.apo.nmsu.edu/privkey2.pem

Alas nothing, no change, websites still report the expired ticket. Which was in another folder /live/apo.nmsu.edu-0004 which I moved to tmp. So not sure how apache is still picking all that up.

Did an apachectl stop apachectl start and even a restart and also reset the vm this is all running on.

Same issues. Completely out of ideas, even checked using openssl the new .pem files and they do expire correctly in 90 days (they are from letsencrypt).

also:

[root@web-server extra]# apachectl -v
Server version: Apache/2.4.6 (Scientific Linux)
Server built:   Jul 29 2019 10:53:12

Codejoy

Asked: 2019-09-02 22:23:59 +0800 CST

certbot-auto renew fails

1

I inherited a web-server that uses letsencrypt with certbot. At first I thought it seemed straight forward, but running certbot-auto renew fails. I then did a certbot-auto certonly --apache and that downloaded a cert just fine (That then running renew again pick ups and even says its new doesnt neeed renewal). Not sure what I am missing or have yet to learn but some of the failure messages are: (names changed to protect the innocent)

Saving debug log to /var/log/letsencrypt/letsencrypt.log



- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/xyx.someaddress.com-0004.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cert is due for renewal, auto-renewing...
Plugins selected: Authenticator webroot, Installer None
Renewing an existing certificate
Performing the following challenges:
http-01 challenge for xyx.someaddress.com
Waiting for verification...
Challenge failed for domain xyx.someaddress.com
Cleaning up challenges
Attempting to renew cert (xyx.someaddress.com-0004) from /etc/letsencrypt/renewal/xyx.someaddress.com-0004.conf produced an unexpected error: Some challenges have failed.. Skipping.
All renewal attempts failed. The following certs could not be renewed:
  /etc/letsencrypt/live/xyx.someaddress.com-0004/fullchain.pem (failure)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
IMPORTANT NOTES:
 - The following errors were reported by the server:
1 renew failure(s), 0 parse failure(s)
Domain: xyx.someaddress.com
   Type:   unauthorized
   Detail: Invalid response from
   https://xyx.someaddress.com/.well-known/acme-challenge/oMvZoCPBM8qZjYcIOlSHs0SLophprew9-c9zASc9d1s
   [192.41.211.157]: "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML
   2.0//EN\">\n<html><head>\n<title>404 Not
   Found</title>\n</head><body>\n<h1>Not Found</h1>\n<p"

   To fix these errors, please make sure that your domain name was
   entered correctly and the DNS A/AAAA record(s) for that domain
   contain(s) the right IP address.

So the biggest thing is I see the 'to fix these errors', but what does it mean domain name was entered correctly..where? and the DNS A/AAA are the right Ip address? No Idea where i check that. The conf file has many (omitted) domain names in it here but looks like this:

[renewalparams]
authenticator = webroot
account = ******************
server = https://acme-v02.api.letsencrypt.org/directory
[[webroot_map]]
xyx.someaddress.com = /var/www/html
www.xyx.someaddress.com = /var/www/html

Running the standard certbot-auto with the certonly for apache created a new folder in:

/etc/letsencrypt/live

and in that folder had the latest .pem files, so i went to where the only place I saw where to change the SSLCertificateFile which was in:

/etc/httpd/extra/ssl-certs and also /etc/httpd/extra/ssl-certs-proxy.

The files I have point now to the new location of the .pem files (which are symlinked from the live folder).

Running openssl on these pem files it does seem like they expire correctly now yet when I goto:

https://www.ssllabs.com/ssltest

put in my site that is live, it says the cert was valid giving a date of yesterday and that it is expired. I cannot figure out where my apache insists on using old certs. Is there a cache to clear out?

Also to note, /etc/httpd/logs the ssl_error_log gives a lot of these warnings:

[Mon Sep 02 06:43:22.246692 2019] [ssl:warn] [pid 4478] AH01909: RSA certificate configured for web-server.xxx.yyy.zzz:443 does NOT include an ID which matches the server name (not sure if this is relevant)

I did make sure the .well-known/acme-challenges was writable (just made it chmod 777 for now to triple make sure, won't keep it like that of course). Though again, I have all new certs (ditching the renewal option) and apache still isn't using them.

Codejoy

Asked: 2019-06-21 07:41:54 +0800 CST

virt-install hangs, no apparent error in logs and virsh list later shows its running after ctrl c exit, no domifaddr though

3

Got thrown into a situation managing boxes that the person before me used virt, so trying to come up to speed. As a test doing a virt-install of this:

virt-install --virt-type=kvm --name kosmos-icc --ram 1000 --os-variant=centos7.0 --cdrom=/var/lib/libvirt/boot/CentOS-7-x86_64-Minimal-1810.iso --network=bridge=virbr0,model=virtio --graphics vnc --disk path=/var/lib/libvirt/images/centos7.qcow2,size=8,bus=virtio,format=qcow2 --boot userserial=on

It runs with this:

WARNING  Graphics requested but DISPLAY is not set. Not running virt-viewer.
WARNING  No console to launch for the guest, defaulting to --wait -1

Starting install...
Allocating 'centos7.qcow2'                                                                                                                                                | 8.0 GB  00:00:00     
Domain installation still in progress. Waiting for installation to complete.

Then hangs, I can hit ctrl-c and get my prompt back, doing a virsh list shows that it is running but doing a virsh domifaddr kosmos-icc shows nothing (the other one that is generic and was installed using the gui) shows the ip address (that i can ssh into) from the machine.

So not sure why it isn't completing or if it is and being silent about it or if I am missing a switch. I was assuming the virbr0 was the way to go with the network. So still learning virsh/virt and seeing if I can via a command line install a VM and then replicate the process on a non test machine.

The install logs in /root/.cache/virt-manager show no real errors... in fact shows:

[Wed, 19 Jun 2019 11:28:38 virt-install 351] DEBUG (guest:441) XML fetched from libvirt object:

... the xml ...


[Wed, 19 Jun 2019 11:28:38 virt-install 351] DEBUG (virt-install:744) Domain state after install: 1

that state after install is the last line before the log sees my ctrl-c keyboard interrupt.

Strange error using du -h -d1

umount a bind source won't work, and device out of space

macos: env: python: No such file or directory when I use dot slash on a .py file

Stopping DRBD so I can run some tests with a VM

NFS mount a user cannot write gets permission denied. Gid, UID match and am not using all_squash

script backups sqlite database, when ran as a cron the db and names are mangled

Let openldap users change password with passwd in centos, i broke it

Trying to get sudoers working on openldap/centos7

Openldap and nfserver, both work although /home/user cannot be created unless I log into the nfserver first with new ldapusers

simple systemd service and socket failing

Simple code running in systemd exits to fast?

Installed xinetd, started it but said 'removing' on several services including one I configured in /etc/xinetd.d/

exportfs with rw not working , fstab either. Mount point still read only on my setup machine

Cannot virsh start my VM after a reboot of physical machine..error only a single IDE controller is supported?

I had some drbd (i guess block devices?) on my server, after a crash they are all gone

postfix smtp slow to send from clients (and sometimes does not authenticate), works fine with 'squirrelmail'?

I tried postfix restart, said service isn't running. Yet I am on my working mail server? (cent os)

apache does not see my new certs, still has expired certs

certbot-auto renew fails

virt-install hangs, no apparent error in logs and virsh list later shows its running after ctrl c exit, no domifaddr though

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?