Sunday, October 10, 2021

Recent Questions - Server Fault

Recent Questions - Server Fault


How can I mitigate the risks of intentionally allowing users to run code on a server, in a resource-efficient manner?

Posted: 10 Oct 2021 10:35 PM PDT

To summarize:

I want to run multiple servers on a host, for different groups of users, and allow them to add and modify functionality of the servers at runtime with some form of scripting. This is a 100% necessary feature for what I want to do, but there is obviously a lot of potential for this to go horribly wrong; what are some actions/approaches I can take to mitigate the risks involved?

It's probably worth mentioning that I am not a system administrator and am most likely unaware of many best practices taken for granted by those that are; my ideas are based on what seems logical to me given what little I know - if any of them seem misguided or if I'm missing some important ideas, please assume I am ignorant of what should be done and let me know.

To elaborate:

Users will connect to a master server, which will handle authentication and facilitate accessing their scriptable servers of choice. I'm expecting that users will establish a connection to a scriptable server and afterward communicate directly to it for a session; that seems logical from the point of view of an application developer, I'm not sure if it's a bad security practice?

All the servers will run linux and be run on a linux host.

The master server will have a database, probably PostgreSQL, with some user-related data e.g. for authentication. It will probably also have some data regarding the scriptable servers so it can advertise them and handle connecting users to them.

The scriptable servers will need some user-related data as well, but mostly will contain user-created content; each scriptable server will store that data in an SQLite database.

Users should not be able to run any code on the master server.

On the scriptable servers, I envision them using a language like Tcl or Lua, which can be embedded and allow exposing only part of their functionality to users. Tcl does this via a "safe interpreter"; Lua apparently has some sandboxing capability.

I don't expect these language features to completely protect my servers from being compromised.

I am considering running all the servers in their own rootless docker containers. I know this is not enough to contain a compromise, but my understanding is that it should help.

The next part is where I am especially unsure what to do:

Running each server in its own virtual machine with VirtualBox or the like would provide what I imagine would be very good isolation, however it would also consume a lot of resources; I need to avoid that, or else find a linux distro that only consumes something like 5-20 MB of RAM, which seems unlikely to me and would still likely be way more resources than the servers themselves would use, which would be very unappealing.

Is there a good answer or alternative to this?

Ultimately, while I hope to avoid any system being compromised, I expect it to happen at some point or another; if somebody wants to do that, they're going to figure it out.

I'm looking for a resource-efficient way to make it difficult for a single compromised server to lead to others being compromised; even better would be if I could keep the compromised server from being used to wreak havoc across the Internet. I'd also like to have a good way to recover from this; I intend to take backups and send them elsewhere, but I'd be interested in any other suggestions - I could see that being another question, however.

open-ssh server doesn't allow multiple logins on a LAN when connecting to public IP

Posted: 10 Oct 2021 10:10 PM PDT

I have an open-ssh server setup on Linux Mint with ssh keys and a ~/.ssh/config file on my Mac where I do most of my workflow.

If I am at home on a LAN with the server and run ssh my_username@my_public_IP it works the first time, but hangs the second time.

However, the same command when not on the LAN works as expected, allowing me to make multiple ssh connections simultaneously.

Why does this happen, and how can I mitigate this behavior so I don't need to run a different ssh command depending on where I am?

Unable to scrape kublet api from prometheus

Posted: 10 Oct 2021 09:53 PM PDT

I am setting up prometheus to scrape kubernetes cluster. I am trying to use "role: node" with kubernetes_sd_config to monitor one of the K8s cluster. I created certificate ashishcert.pem for user "ashish" and prometheus will use this cert to scrape the cluster. This certificate is signed by cluster CA.

Prometheus.yml

Now when i look back in my prometheus, it says "cannot validate certificate x.x.x because it does not contain any IP SAN's"

result on prometheus side

The port no given in image is for kublet and that means its unable to scrap kublet metrics for all the nodes in cluster. Though i have added all the node names and IPs in SAN of certificate.

i validated my certificate by checking metrics of apisever using my cert and CA cert with below command.

curl -v https://myclustername:6443/metrics --cacert ca.pem --cert ashishcert.pem --key ashishkey.pem  

And the above command worked successfully. my cert was accepted by apiserver. However when i tried to curl kublet metrics with path https://myclustername:10250/metrics. it gave me an error saying CA is not trusted. looks like kublet CA is different than apiserver CA.

result while doing curl

I had understanding that my certificate will connect me (prometheus) to apiserver and then its apiserver duty for all further communications like apiserver will use its certificate to get the metrics from kublet. However with results of above commands, looks like mycert is being authenticated directly with kublet also. Please confirm whose certificate will be used for internal communications.

How to scrape all the nodes with role: node without ignoring certificates?

Does Azure Application Insights/Monitor have a way to check uptime of external REST APIs?

Posted: 10 Oct 2021 08:49 PM PDT

with AI, you can ping websites from different regions, like pingdom. However, we are looking for a way to call external (not hosted in Azure) rest endpoints, ideally being able to take the output of one, extract a token, then use it in the parameters of another. This can be done with Synthetics in NewRelic (extremely expensive), or "advanced" checks in pingdom (doesn't support UK as a source), but we would prefer to do it via Azure.

We could setup a VM, and run curl from a shell script, but this is a poor solution.

This is not to be confused with using REST apis to access azure monitor, it's the opposite.

How to port forward on an ATT Router/Modem?

Posted: 10 Oct 2021 08:38 PM PDT

I am trying to open port 2222 for an open-ssh server on my linux machine.

I am able to log into it just fine from my local IP address.

sudo systemctl status returns:

     Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)       Active: active (running) since Sun 2021-10-10 19:25:19 PDT; 34min ago         Docs: man:sshd(8)               man:sshd_config(5)      Process: 9445 ExecStartPre=/usr/sbin/sshd -t (code=exited, status=0/SUCCESS)     Main PID: 9446 (sshd)        Tasks: 1 (limit: 19025)       Memory: 3.6M       CGroup: /system.slice/ssh.service               └─9446 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups    Oct 10 19:25:19 jacob-desktop systemd[1]: Starting OpenBSD Secure Shell server...  Oct 10 19:25:19 jacob-desktop sshd[9446]: Server listening on 0.0.0.0 port 2222.  Oct 10 19:25:19 jacob-desktop sshd[9446]: Server listening on :: port 2222.  Oct 10 19:25:19 jacob-desktop systemd[1]: Started OpenBSD Secure Shell server.  Oct 10 19:49:54 jacob-desktop sshd[9648]: Accepted publickey for jacob from 192.168.1.220 port 53539 ssh2: ED25519 SHA256:9DMi>  Oct 10 19:49:54 jacob-desktop sshd[9648]: pam_unix(sshd:session): session opened for user jacob by (uid=0)  

But according to https://www.portchecktool.com/ my port 2222 is closed. My att port config is as follows:

This one doesn't work

What is so frustrating is that a port I opened using a different tool within the modem works, as shown below: This one works

I don't see why port 22 should be open and work whereas port 2222 does not.

Is Att's modem bad, or am I making some error I don't see?

Thank anyone who helps with this, I've been ripping out my hair for months on various server-building attempts.

How do I shrink my EBS Volume with out loosing my site data

Posted: 10 Oct 2021 07:29 PM PDT

I would like to reduce my ebs volume size without loosing my site. I already tried to take snap shots and creating another smaller volume but all methods I try involve using commands which I would not like to use. Anyone with a more easier way please help

clamav - clamd error when setup as daemon (mac osx)

Posted: 10 Oct 2021 07:10 PM PDT

Issue:

Setting up clamav as a daemon process in mac osx throws some cumbersome errors and warnings while doing the setup and the documentation is good, but not perfect. I ran into some permission issues, file location issues, etc.

Things that are working:

  • freshclam daemon via a cronjob(will post below)
  • getting the daemon to load via launchd and show via sudo launchctl list | grep clam
  • starting the daemon via launchd*

Things that are not working:

  • clamd created from launchd plist does not stay in list after starting
  • clamd starts, but returns the error below

Error:

clamdclam.log:

ERROR: LOCAL: Socket file /usr/local/etc/clamav/clamd.socket is in use by another process.  

Setup:

CONFIG_DIR="/usr/local/etc"  CLAM_HOME_DIR=~/clamav    # Make dir for configs in home dir  mkdir -p ${CLAM_HOME_DIR}    # Create configs  clamconf -g freshclam.conf > ${CLAM_HOME_DIR}/freshclam.conf  clamconf -g clamd.conf > ${CLAM_HOME_DIR}/clamd.conf  clamconf -g clamav-milter.conf > ${CLAM_HOME_DIR}/clamav-milter.conf    # Link configs  ln -nsf $(pwd)/freshclam.conf /usr/local/etc/clamav/  ln -nsf $(pwd)/clamd.conf /usr/local/etc/clamav/  ln -nsf $(pwd)/clamav-milter.conf /usr/local/etc/clamav/    # Test freshclam is working    # create freshclam a log file  sudo touch /var/log/freshclam.log  sudo chmod 600 /var/log/freshclam.log  sudo chown clamav /var/log/freshclam.log      # create Clamd Log file  sudo touch /var/log/clamdclam.log  sudo chmod 600 /var/log/clamdclam.log  sudo chown clamav /var/log/clamdclam.log  

Files:

All configs and functional files

/usr/local/etc/clamav:

ls -l /usr/local/etc/clamav/  total 472256  -rw-r--r--  1 _clamav  admin       293670 Oct 10 17:35 bytecode.cvd  lrwxr-xr-x  1 user     admin           37 Oct 10 17:14 clamav-milter.conf -> /Users/user/clamav/clamav-milter.conf  lrwxr-xr-x  1 root     admin           29 Oct 10 20:48 clamd.conf -> /Users/user/clamav/clamd.conf  -rwxrwxr-x  1 user     admin        26784 Oct  9 16:46 clamd.conf.sample  -rw-r--r--  1 root     wheel            5 Oct 10 21:09 clamd.pid  srw-rw----  1 root     wheel            0 Oct 10 20:59 clamd.socket  lrwxr-xr-x  1 user     admin           31 Oct 10 19:25 clamd_run.sh -> /Users/user/clamav/clamd_run.sh  -rw-r--r--  1 _clamav  admin     56261254 Oct 10 17:34 daily.cvd  lrwxr-xr-x  1 user     admin           33 Oct 10 17:14 freshclam.conf -> /Users/user/clamav/freshclam.conf  -rwxrwxr-x  1 user     admin         7204 Oct  9 16:46 freshclam.conf.sample  -rw-r--r--  1 _clamav  _clamav         69 Oct 10 17:34 freshclam.dat  -rw-r--r--  1 _clamav  admin    170479789 Oct 10 17:35 main.cvd  

mac osx plist file /Library/LaunchDaemons/com.clamd.daemon.plist

<?xml version="1.0" encoding="UTF-8"?>  <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">  <plist version="1.0">    <dict>      <key>Label</key>      <string>com.clamav.daemon</string>      <key>ProgramArguments</key>          <array>              <string>/usr/local/Cellar/clamav/0.104.0_1/sbin/clamd</string>              <string>-c</string>              <string>/usr/local/etc/clamav/clamd.conf</string>              <string>-l</string>              <string>/var/log/clamdclam.log</string>          </array>      <key>KeepAlive</key>      <dict>          <key>Crashed</key>          <true/>      </dict>      <key>StandardOutPath</key>      <string>/tmp/test.stdout</string>      <key>StandardErrorPath</key>      <string>/tmp/test.stderr</string>      <key>RunAtLoad</key>      <true/>      <key>LaunchOnlyOnce</key>      <true/>    </dict>  </plist>  

Currently Testing:

  • changed file ownership: was user:wheel -> root:wheel -> root:admin

srw-rw---- 1 root wheel 0 Oct 10 20:59 clamd.socket

Best way to do simple new employee laptop setup for Windows 10/11 without Windows Server or cloning

Posted: 10 Oct 2021 08:41 PM PDT

(Recommended I move here from SuperUser) I ran a few Windows based labs over a decade ago, past few years I've mostly been managing Macs and systems in AWS and GCP. Right now the company is hiring a lot of finance folks who want to work on Windows laptops. The basic setup is I setup a local admin account so we always can get back into the machine, add a user account for the new employee, and install basic apps like Google Chrome, Office, Zoom, etc. nothing too fancy. We don't have Active Domain control running, don't have SCCM at this time. With this context, I was wondering if there's a way to do a simple automated install to new laptops similar to MacOS Migration Assistant, where a new laptop already has a fresh Windows OS from the factory, and I just want to transfer the user account and additional apps installed. I did some research and mostly saw more elaborate enterprise options using SCCM and Windows Server etc.

Ldap service not running on Windows Server 2019

Posted: 10 Oct 2021 06:42 PM PDT

I have 2 windows server 2019. e.g. server1 and server2. server1 is the domain controller. server1 has below roles installed: ADDS, ADCS, DNS, FILE STORAGE, IIS.

server2 is connected to that domain controler. server1 has below roles installed: ADCS, FILE STORAGE, IIS.

I have setup PKI on server1 and everything works fine. I am able to use CRL as well as OCSP feature for certificate validation.

I wanted to make server2 as subordinate CA of server1(root CA), and installed corresponding roles(ADCS) and able to distribute user certificate and its working fine. But I am not able to test CRL functionality on server2 as it required ldap binding with server2.

As I debugged it further, I found that LDAP server is not running on server2. I checked port 389 is listening on server1 but not server2.

So how to enable ldap service on server2 ? I am not able to test CRL functionality of PKI, because CDP url is ldap address.

Create multiple directories with mode and loop via ansible [SOLVED]

Posted: 10 Oct 2021 09:56 PM PDT

I'm trying to play with loop and ask in a playbook to ansible creating multiple directories on a server with specific attributes : mode owner group .

I think i'm close but don't get it working.

I get this error:

Unsupported parameters for (file) module: recursive Supported parameters include: _diff_peek, _original_basename, access_time, access_time_format, attributes, backup, content, delimiter, directory_mode, follow, force, group, mode, modification_time, modification_time_format, owner, path, recurse, regexp, remote_src, selevel, serole, setype, seuser, src, state, unsafe_writes

Any advice would be very appreciated :

Here is the playbook sample :

  - name: ansible create directory with_items example  file:    path: "{{ item.dest }}"    mode: "{{item.mode}}"    owner: "{{item.owner}}"    group: "{{item.group}}"    recursive: true       state: directory  loop:    - { dest: '/var/lib/tftpboot/os/uefi/debian11', mode: '0744', owner: 'root', group: 'root' }    - { dest: '/var/lib/tftpboot/os/uefi/ubuntu2004D', mode: '0744', owner: 'root', group: 'root'}    - { dest: '/var/lib/tftpboot/os/uefi/f34w', mode: '0744', owner: 'root', group: 'root'}    - { dest: '/var/lib/tftpboot/os/uefi/f34s', mode: '0744', owner: 'root', group: 'root'}    - { dest: '/srv/nfs/isos', mode: '0744', owner: 'root', group: 'rpcuser'}    - { dest: '/srv/nfs/pxe/debian11', mode: '0744', owner: 'root', group: 'rpcuser'}    - { dest: '/srv/nfs/pxe/ubuntu2004', mode: '0744', owner: 'root', group: 'rpcuser'}    - { dest: '/srv/nfs/pxe/f34w', mode: '0744', owner: 'root', group: 'rpcuser'}    - { dest: '/srv/nfs/pxe/f34s', mode: '0744', owner: 'root', group: 'rpcuser'}    - { dest: '/tmp/debian11', mode: '0744', owner: 'root', group: 'root'}    - { dest: '/tmp/f34w', mode: '0744', owner: 'root', group: 'root'}    - { dest: '/tmp/ubuntu2004D', mode: '0744', owner: 'root', group: 'root'}  

SQUID Transparent Proxy: Error INVALID_URL and ACCESS_DENIED

Posted: 10 Oct 2021 10:45 PM PDT

I configure squid proxy on Centos 7. I am using Squid version 3.5.20. I also try squid 4.10 on Ubuntu 20.04, but I got the same problem. Maybe my ACL was wrong.

I configure DSTNAT on Router to intercept HTTP traffic from 192.168.1.0/24 to Squid Proxy 10.10.10.10:3128.

topology

This is /etc/squid/squid.conf file:

acl localnet src 10.0.0.0/8     # RFC1918 possible internal network  acl localnet src 172.16.0.0/12  # RFC1918 possible internal network  acl localnet src 192.168.0.0/16 # RFC1918 possible internal network  acl whitelist_domain dstdomain "/etc/squid/whitelist.acl"    http_access allow localnet  http_access allow localhost  http_access allow whitelist_domain  http_access deny all    http_port 3128  coredump_dir /var/spool/squid  refresh_pattern ^ftp:           1440    20%     10080  refresh_pattern ^gopher:        1440    0%  1440  refresh_pattern -i (/cgi-bin/|\?) 0     0%  0  refresh_pattern .               0   20%     4320  

And this is the /etc/squid/whitelist.acl file:

linux.or.id  lipi.go.id  

Please help me to find the problem.

So,regarding the above config, the client will be denied to access all http website, except linux.or.id and lipi.go.id. Right?

However, when I try to connect. All website has this error: INVALID URL enter image description here

This is /var/log/squid/access.log

1633885185.900      0 192.168.1.251 TAG_NONE/400 3867 GET / - HIER_NONE/- text/html  1633885185.970      0 192.168.1.251 TCP_IMS_HIT/304 295 GET http://linux:3128/squid-internal-static/icons/SN.png - HIER_NONE/- image/png    

I was trying to change the squid.conf:

http_port 3128 intercept  http_port 3129  

However, I got ERROR ACCESS DENIED, which mean my ACL blocked the access right?

enter image description here

GCP data centre fire safety details

Posted: 10 Oct 2021 06:06 PM PDT

I've been asked by a customer to provide fire safety details for Google Cloud Platform. They require it for their procurement data security policies. We use the europe_west2 London region to host their services and data. Is there any way to find out the fire safety details of this, or other data centres? I've tried extensively to find it through the GCP console and documentation and I've drawn a blank, and there is no contact available with the basic support plan that we're currently on.

many thanks

RemainAfterExit in Upstart

Posted: 10 Oct 2021 07:17 PM PDT

Is there an Upstart equivalent to systemd's RemainAfterExit?

I have an upstart task that exec's a script that completes quickly when the task is started. However, I would still like that task to report as active so that I can subsequently 'stop' the task and have it execute a cleanup script.

In systemd, I would do the following:

[Service]  Type=oneshot  RemainAfterExit=true  ExecStart=/usr/local/bin/my_script.sh create %i  ExecStop=/usr/local/bin/my_script.sh delete %i  

How would I do the same thing in Upstart?

How can I rsync a rpm link?

Posted: 10 Oct 2021 07:13 PM PDT

I want to download all the packages that it is in here. But I do not want to download them one by one. How can I do that in rsync? Thanks!

SIGKILL has no effect on a process running 100% CPU

Posted: 10 Oct 2021 10:07 PM PDT

I have a weird behaiviour on my pi4 running Ubuntu server 21.04. It's running correctly, but at after a while, a can see a process running 100% CPU from hours, and if I wait longer there are 2, 3 ... other processes running 100% CPU. They seem to be launched by a cron job (from the home automation Jeedom), but this is not my question.

The weird thing is I cannot kill them, even root user with kill -9 . The process is running R, but not responding.

#ps aux | grep 46149  www-data   46149 99.7  0.0   2040    80 ?        R    Oct04 633:33 sh -c (ps ax || ps w) | grep -ie "cron_id=7$" | grep -v "grep"  #sudo kill -9 46149  #ps aux | grep 46149  www-data   46149 99.7  0.0   2040    80 ?        R    Oct04 633:36 sh -c (ps ax || ps w) | grep -ie "cron_id=7$" | grep -v "grep"  

In this example, the blocked process is 'ps', but this is not always the same. If a power off the pi, it restarts normally, but another blocked process will appear after a while. And I need to power off, because 'reboot' will not work.

Edit: Using 'ps axjf' to see process tree

      1    7317    7317    1799 ?             -1 Sl       0    0:56 /usr/bin/containerd-shim-runc-v2 -namespace moby -id bf40089312cdb1d7707096fe6fc46520c7c1a17a70eac305473761976c1f4b7d -address /run/cont     7317    7337    7337    7337 ?             -1 Ss       0    1:12  \_ /usr/bin/python2 /usr/bin/supervisord -c /etc/supervisor/supervisord.conf     7337    7391    7391    7337 ?             -1 S        0    0:02  |   \_ /usr/sbin/cron -f -L4     7391  104917    7391    7337 ?             -1 S        0    0:00  |   |   \_ /usr/sbin/CRON -f -L4   104917  104919  104919  104919 ?             -1 Ss       0    0:00  |   |   |   \_ /bin/sh -c /usr/bin/php /var/www/html/core/php/watchdog.php >> /dev/null   104919  104920  104919  104919 ?             -1 R        0 1521:41  |   |   |       \_ /bin/sh -c /usr/bin/php /var/www/html/core/php/watchdog.php >> /dev/null     7391  395309    7391    7337 ?             -1 S        0    0:00  |   |   \_ /usr/sbin/CRON -f -L4   395309  395312  395312  395312 ?             -1 Ss      33    0:00  |   |       \_ /bin/sh -c /usr/bin/php /var/www/html/core/php/jeeCron.php >> /dev/null   395312  395313  395312  395312 ?             -1 S       33    0:00  |   |           \_ /usr/bin/php /var/www/html/core/php/jeeCron.php   395313  395341  395312  395312 ?             -1 S       33    0:00  |   |               \_ sh -c (ps ax || ps w) | grep -ie "cron_id=4$" | grep -v "grep"   395341  395344  395312  395312 ?             -1 R       33  109:29  |   |                   \_ sh -c (ps ax || ps w) | grep -ie "cron_id=4$" | grep -v "grep"     7337    7392    7392    7337 ?             -1 S        1    0:00  |   \_ /usr/sbin/atd -f     7337    8613    8613    7337 ?             -1 Sl       0    6:16  |   \_ /usr/bin/python3 /usr/bin/fail2ban-server -fc /etc/fail2ban/     7337   11223   10184   10184 ?             -1 S       33    0:08  |   \_ php /var/www/html/core/class/../php/jeeCron.php cron_id=452778     7337   18465   18465   18465 ?             -1 SNs      0    0:08  |   \_ /usr/sbin/apache2 -k start    18465  168788   18465   18465 ?             -1 SN      33    0:48  |   |   \_ /usr/sbin/apache2 -k start    18465  354445   18465   18465 ?             -1 SN      33    0:27  |   |   \_ /usr/sbin/apache2 -k start    18465  356077   18465   18465 ?             -1 SN      33    0:24  |   |   \_ /usr/sbin/apache2 -k start    18465  356301   18465   18465 ?             -1 SN      33    0:25  |   |   \_ /usr/sbin/apache2 -k start    18465  362824   18465   18465 ?             -1 SN      33    0:16  |   |   \_ /usr/sbin/apache2 -k start    18465  364208   18465   18465 ?             -1 SN      33    0:14  |   |   \_ /usr/sbin/apache2 -k start    18465  366422   18465   18465 ?             -1 SN      33    0:12  |   |   \_ /usr/sbin/apache2 -k start    18465  366848   18465   18465 ?             -1 SN      33    0:12  |   |   \_ /usr/sbin/apache2 -k start    18465  367416   18465   18465 ?             -1 SN      33    0:10  |   |   \_ /usr/sbin/apache2 -k start    18465  367576   18465   18465 ?             -1 SN      33    0:11  |   |   \_ /usr/sbin/apache2 -k start    18465  405605   18465   18465 ?             -1 SN      33    0:03  |   |   \_ /usr/sbin/apache2 -k start     7337   18824   18465   18465 ?             -1 SN      33  174:59  |   \_ php /var/www/html/core/class/../php/jeeCron.php cron_id=301554     7337   35774   18465   18465 ?             -1 SNl     33    0:31  |   \_ node /var/www/html/plugins/alexaapi/resources/alexaapi.js http://app_jeedom amazon.fr alexa.amazon.fr OtAkaDFZj3YlSEQg6T1VGk8Jq8     7337   44738   44738   44738 ?             -1 SNs    106    0:00  |   \_ /usr/bin/dbus-daemon --system     7337   44767   44766   44766 ?             -1 SN     107    1:13  |   \_ avahi-daemon: running [bf40089312cd.local]    44767   44768   44766   44766 ?             -1 SN     107    0:00  |   |   \_ avahi-daemon: chroot helper     7337   45616   18465   18465 ?             -1 SNl     33    4:20  |   \_ homebridge    45616   45664   18465   18465 ?             -1 SNl     33    2:10  |   |   \_ homebridge-config-ui-x     7337   46149   46102   46102 ?             -1 R       33 1931:04  |   \_ sh -c (ps ax || ps w) | grep -ie "cron_id=7$" | grep -v "grep"     7337  407386   18465   18465 ?             -1 RN      33    0:00  |   \_ php /var/www/html/core/class/../php/jeeListener.php listener_id=2 event_id=379484 value='1310' datetime='2021-10-06 06:36:25'     7317   22607   22607   22607 ?          22607 Ss+      0    0:00  \_ /bin/bash  

Edit

I tried to kill parent: every level of the process tree has been killed, except the parent and the blocked process (2 processes this time with the same parent). Now I have

root        5790  0.0  0.0      0     0 ?        Ss   Oct09   0:14  \_ [sh]  www-data  267740 99.4  0.0   2040    84 ?        RN   05:05 1032:49      \_ sh -c ps ax | grep "resources/alexaapi.js" | grep -v "grep" | wc -l HOME=/var/www LOGNAME=www-data PATH=/usr/bin:/bin SHELL=/bi  www-data  357120 99.5  0.0   2040    80 ?        RN   14:00 501:07      \_ sh -c (ps ax || ps w) | grep -ie "cron_id=469432$" | grep -v "grep" HOME=/var/www LOGNAME=www-data PATH=/usr/bin:/bin SHELL=/bin  

And with 'ps-ef':

root        5790    5760  0 Oct09 ?        00:00:14 [sh]  www-data  267740    5790 99 Oct10 ?        1-01:58:16 sh -c ps ax | grep "resources/alexaapi.js" | grep -v "grep" | wc -l  www-data  357120    5790 99 Oct10 ?        17:06:33 sh -c (ps ax || ps w) | grep -ie "cron_id=469432$" | grep -v "grep"  

WHfB - Hybrid Certificate Trust - Failed provisioning

Posted: 10 Oct 2021 07:03 PM PDT

After setting up Windows Hello for Business, in a Hybrid Azure AD joined Certificate Trust Deployment scenario, i ended up with the following events in my test client machine after a failed provisioning.

I reviewed my setup, but i must be missing something. Any help would be highly appreciated.

    ##############################    Microsoft-Windows-AAD/Operational      TimeCreated : 13/05/2020 11:57:04   Id          : 1082  Message     : Key error: DecodingProtectedCredentialKeyFatalFailure                Error description: AADSTS9002313: Invalid request. Request is malformed or invalid.                Trace ID: 834deec1-21d8-48c2-bae5-7f795e312f00                Correlation ID: 88bc2dda-ba2a-42dc-a9da-7b9f362f7d7a                Timestamp: 2020-05-13 22:57:04Z                CorrelationID: 88bc2dda-ba2a-42dc-a9da-7b9f362f7d7a      TimeCreated : 13/05/2020 11:57:03   Id          : 1118  Message     : Enterprise STS Logon failure. Status: 0xC000006D Correlation ID: FE6DBC4F-69BB-426B-933B-0BADB38A1361    TimeCreated : 13/05/2020 11:57:03   Id          : 1081  Message     : OAuth response error: invalid_grant                Error description: MSIS9683: Received invalid OAuth JWT Bearer request. Transport key for the device is invalid. It must be a RSA public key blob or TPM storage key blob.                CorrelationID:       TimeCreated : 13/05/2020 11:57:03   Id          : 1025  Message     : Http request status: 400. Method: POST Endpoint Uri: https://adfs.domain.com/adfs/oauth2/token/ Correlation ID: FE6DBC4F-69BB-426B-933B-0BADB38A1361    TimeCreated : 13/05/2020 11:56:01   Id          : 1082  Message     : Key error: DecodingProtectedCredentialKeyFatalFailure                Error description: AADSTS9002313: Invalid request. Request is malformed or invalid.                Trace ID: 4a2197fa-c85f-4ea0-af79-1a830e1d2d00                Correlation ID: f6141ebb-116c-4701-9118-80124017b6d1                Timestamp: 2020-05-13 22:56:02Z                CorrelationID: f6141ebb-116c-4701-9118-80124017b6d1      TimeCreated : 13/05/2020 11:56:01   Id          : 1118  Message     : Enterprise STS Logon failure. Status: 0xC000006D Correlation ID: E5C246DD-9FFF-4E07-92A5-61389B08C64A    TimeCreated : 13/05/2020 11:56:01   Id          : 1081  Message     : OAuth response error: invalid_grant                Error description: MSIS9683: Received invalid OAuth JWT Bearer request. Transport key for the device is invalid. It must be a RSA public key blob or TPM storage key blob.                CorrelationID:       TimeCreated : 13/05/2020 11:56:01   Id          : 1025  Message     : Http request status: 400. Method: POST Endpoint Uri: https://adfs.domain.com/adfs/oauth2/token/ Correlation ID: E5C246DD-9FFF-4E07-92A5-61389B08C64A        #######################################  Microsoft-Windows-HelloForBusiness/Operational      TimeCreated : 13/05/2020 11:57:00   Id          : 5520  Message     : Device unlock policy is not configured on this device.    TimeCreated : 13/05/2020 11:56:03   Id          : 7054  Message     : Windows Hello for Business prerequisites check failed.                  Error: 0x1    TimeCreated : 13/05/2020 11:56:03   Id          : 8205  Message     : Windows Hello for Business successfully located a usable sign-on certificate template.    TimeCreated : 13/05/2020 11:56:03   Id          : 8206  Message     : Windows Hello for Business successfully located a certificate registration authority.    TimeCreated : 13/05/2020 11:56:03   Id          : 7211  Message     : The Secondary Account Primary Refresh Token prerequisite check failed.    TimeCreated : 13/05/2020 11:56:03   Id          : 8202  Message     : The device meets Windows Hello for Business hardware requirements.    TimeCreated : 13/05/2020 11:56:03   Id          : 8204  Message     : Windows Hello for Business post-logon provisioning is enabled.    TimeCreated : 13/05/2020 11:56:03   Id          : 8203  Message     : Windows Hello for Business is enabled.    TimeCreated : 13/05/2020 11:56:03   Id          : 5204  Message     : Windows Hello for Business certificate enrollment configurations:                   Certificate Enrollment Method: RA                Certificate Required for On-Premise Auth: true    TimeCreated : 13/05/2020 11:56:03   Id          : 8200  Message     : The device registration prerequisite check completed successfully.    TimeCreated : 13/05/2020 11:56:03   Id          : 8201  Message     : The Primary Account Primary Refresh Token prerequisite check completed successfully.    TimeCreated : 13/05/2020 11:56:03   Id          : 8210  Message     : Windows Hello for Business successfully completed the remote desktop prerequisite check.    TimeCreated : 13/05/2020 11:56:03   Id          : 3054  Message     : Windows Hello for Business prerequisites check started.    TimeCreated : 13/05/2020 11:56:00   Id          : 8025  Message     : The Microsoft Passport Container service started successfully.    TimeCreated : 13/05/2020 11:56:00   Id          : 8025  Message     : The Microsoft Passport service started successfully.    TimeCreated : 13/05/2020 11:55:07   Id          : 5520  Message     : Device unlock policy is not configured on this device.        #######################################  Microsoft-Windows-User Device Registration/Admin      TimeCreated : 13/05/2020 11:56:59   Id          : 331  Message     : Automatic device join pre-check tasks completed. Debug output:\r\n preCheckResult: DoNotJoin                deviceKeysHealthy: YES                isJoined: YES                isDcAvailable: YES                isSystem: YES                keyProvider: Microsoft Platform Crypto Provider                keyContainer: c9bc09fb-e9bd-4de7-b06a-f8798e6f377c                dsrInstance: AzureDrs                elapsedSeconds: 0                resultCode: 0x1      TimeCreated : 13/05/2020 11:56:59   Id          : 335  Message     : Automatic device join pre-check tasks completed. The device is already joined.    TimeCreated : 13/05/2020 11:56:03   Id          : 360  Message     : Windows Hello for Business provisioning will not be launched.                 Device is AAD joined ( AADJ or DJ++ ): Yes                 User has logged on with AAD credentials: Yes                 Windows Hello for Business policy is enabled: Yes                 Windows Hello for Business post-logon provisioning is enabled: Yes                 Local computer meets Windows hello for business hardware requirements: Yes                 User is not connected to the machine via Remote Desktop: Yes                 User certificate for on premise auth policy is enabled: Yes                 Machine is governed by enrollment authority policy.                 See https://go.microsoft.com/fwlink/?linkid=832647 for more details.    TimeCreated : 13/05/2020 11:56:03   Id          : 362  Message     : Windows Hello for Business provisioning will not be launched.                 Device is AAD joined ( AADJ or DJ++ ): Yes                 User has logged on with AAD credentials: Yes                 Windows Hello for Business policy is enabled: Yes                 Windows Hello for Business post-logon provisioning is enabled: Yes                 Local computer meets Windows hello for business hardware requirements: Yes                 User is not connected to the machine via Remote Desktop: Yes                 User certificate for on premise auth policy is enabled: Yes                 Enterprise user logon certificate enrollment endpoint is ready: Yes                 Enterprise user logon certificate template is : Yes                 User has successfully authenticated to the enterprise STS: No                 Certificate enrollment method: enrollment authority                 See https://go.microsoft.com/fwlink/?linkid=832647 for more details.    TimeCreated : 13/05/2020 11:55:09   Id          : 331  Message     : Automatic device join pre-check tasks completed. Debug output:\r\n preCheckResult: DoNotJoin                deviceKeysHealthy: YES                isJoined: YES                isDcAvailable: YES                isSystem: YES                keyProvider: Microsoft Platform Crypto Provider                keyContainer: c9bc09fb-e9bd-4de7-b06a-f8798e6f377c                dsrInstance: AzureDrs                elapsedSeconds: 1                resultCode: 0x1      TimeCreated : 13/05/2020 11:55:09   Id          : 335  Message     : Automatic device join pre-check tasks completed. The device is already joined.    TimeCreated : 13/05/2020 11:55:05   Id          : 369  Message     : The Workstation Service logged a device registration message.                 Message: AutoJoinSvc/WJComputeWorkplaceJoinTaskState: Machine is already joined to Azure AD.      TimeCreated : 13/05/2020 11:55:05   Id          : 369  Message     : The Workstation Service logged a device registration message.                 Message: AutoJoinSvc/WJSetScheduledTaskState: Running task "\Microsoft\Windows\Workplace Join\Automatic-Device-Join".     TimeCreated : 13/05/2020 11:55:05   Id          : 369  Message     : The Workstation Service logged a device registration message.                 Message: AutoJoinSvc/WJComputeWorkplaceJoinTaskState: Global policy found with value 1.  

Having an issue enabling internode TLS support in rabbitmq / erlang

Posted: 10 Oct 2021 10:18 PM PDT

We are running rabbit v3.8.3-1.el7, erlang v23.3.3.el7, kernel 3.10.0-1062.12.1.el7.x86_64, release Centos 7.7

I have three nodes that I would like in disc mode, cdvlhbqr23[1-3]

However I'm running into an issue after attempting to enable TLS on erlang.

[ cdvlhbqr231:rabbitmq ]  10.128.3.231 :: root -> rabbitmqctl status  Error: unable to perform an operation on node 'rabbit@cdvlhbqr231'. Please see diagnostics information and suggestions below.    Most common reasons for this are:     * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)   * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)   * Target node is not running    In addition to the diagnostics info below:     * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more   * Consult server logs on node rabbit@cdvlhbqr231   * If target node is configured to use long node names, don't forget to use --longnames with CLI tools    DIAGNOSTICS  ===========    attempted to contact: [rabbit@cdvlhbqr231]    rabbit@cdvlhbqr231:    * connected to epmd (port 4369) on cdvlhbqr231    * epmd reports node 'rabbit' uses port 25672 for inter-node and CLI tool traffic    * TCP connection succeeded but Erlang distribution failed    * suggestion: check if the Erlang cookie identical for all server nodes and CLI tools    * suggestion: check if all server nodes and CLI tools use consistent hostnames when addressing each other    * suggestion: check if inter-node connections may be configured to use TLS. If so, all nodes and CLI tools must do that     * suggestion: see the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more      Current node details:   * node name: 'rabbitmqcli-23412-rabbit@cdvlhbqr231'   * effective user's home directory: /var/lib/rabbitmq/   * Erlang cookie hash: MudCW7tn3FA5sTmC1FlR0g==  

I've double checked the cookie file and it's identical across all nodes. All of the hostnames are correct and consistent across the nodes. So I assume this has to be directly the result of ssl / tls

Here's what the node config looks like:

[ cdvlhbqr231:rabbitmq ]  10.128.3.231 :: root -> cat /etc/rabbitmq/rabbitmq.config  [      {rabbit,          [              {vm_memory_high_watermark, 0.4},              {vm_memory_high_watermark_paging_ratio, 0.5},              {memory_alarms, true},              {disk_free_limit, 41686528},              {cluster_partition_handling, autoheal},              {tcp_listen_options,                  [binary,                      {packet, raw},                      {reuseaddr, true},                      {backlog, 128},                      {nodelay, true},                      {exit_on_close, false},                      {keepalive, true}                  ]              },              {cluster_nodes, {['rabbit@cdvlhbqr231', 'rabbit@cdvlhbqr232', 'rabbit@cdvlhbqr233'], disc}},              {loopback_users, []},              {tcp_listeners, [{"0.0.0.0",5672}]},              {ssl_listeners, [{"0.0.0.0",5671}]},              {ssl_options, [                  {cacertfile,"/etc/pki/tls/certs/ca-bundle.crt"},                  {certfile,"/etc/rabbitmq/ssl/cert.pem"},                  {keyfile,"/etc/rabbitmq/ssl/key.pem"},                  {verify,verify_peer},                  {versions, ['tlsv1.2']},                  {fail_if_no_peer_cert,false}              ]}            ]      },      {rabbitmq_management,          [{              listener, [                  {port, 15672},                  {ip, "0.0.0.0"},                                  {ssl, true},                  {ssl_opts, [                      {cacertfile,"/etc/pki/tls/certs/ca-bundle.crt"},                      {certfile,"/etc/rabbitmq/ssl/cert.pem"},                      {keyfile,"/etc/rabbitmq/ssl/key.pem"},                      {verify,verify_peer},                      {versions, ['tlsv1.2']}                  ]}              ]}          ]      }  ].  

The private key is generated on the host and signed by an intermediate CA whose pub key is in the systems extracted cert bundle. We generate an "/etc/rabbitmq/ssl/allfile.pem" which is a bundle of the servers private key and signed cert.

The ssl environment for erlang is defined as the following:

[ cdvlhbqr231:rabbitmq ]  10.128.3.231 :: root -> cat rabbitmq-env.conf  # Obtaining of an Erlang ssl library path  export HOME=/var/lib/rabbitmq/  ERL_SSL_PATH=/usr/lib64/erlang/lib/ssl-9.6.2/ebin    # Add SSL-related environment vars for rabbitmq-server and rabbitmqctl  SERVER_ADDITIONAL_ERL_ARGS="-pa $ERL_SSL_PATH \    -proto_dist inet_tls \    -ssl_dist_opt server_certfile '/etc/rabbitmq/ssl/allfile.pem' \    -ssl_dist_opt server_secure_renegotiate true client_secure_renegotiate true"    # CLI  CTL_ERL_ARGS="-pa $ERL_SSL_PATH \    -proto_dist inet_tls \    -ssl_dist_opt server_certfile /etc/rabbitmq/ssl/allfile.pem \    -ssl_dist_opt server_secure_renegotiate true client_secure_renegotiate true"  

I'm not entirely clear what's causing the issue. I thought I had followed the documentation to a T. Can anyone help me review this and see if there is anything obvious I'm missing, or any suggestions on how to trace down this problem?

systemctl limits - solr complaining

Posted: 10 Oct 2021 10:08 PM PDT

I'm building a solr server (on Ubuntu 18.04, using the repo solr-common and solr-jetty). On startup, solr was reporting that nfile and nproc (1024, 6721 resp) was set too low. I ran systemctl edit solr and created an override as follows:

[Service]  LimitNOFILE=65000  LimitNPROC=65000  

I then restarted the service - solr still reporting the same issue.

I added /etc/security/limits.d/solr containing:

solr hard nofile 65535  solr soft nofile 65535  solr hard nproc 65535  solr soft nproc 65535  

It is still reporting the same issue after restarting the service:

# systemctl status solr  ● solr.service - LSB: Controls Apache Solr as a Service     Loaded: loaded (/etc/init.d/solr; generated)    Drop-In: /etc/systemd/system/solr.service.d             └─override.conf     Active: active (exited) since Mon 2020-03-30 14:55:49 BST; 6s ago       Docs: man:systemd-sysv-generator(8)    Process: 6848 ExecStop=/etc/init.d/solr stop (code=exited, status=0/SUCCESS)    Process: 6973 ExecStart=/etc/init.d/solr start (code=exited, status=0/SUCCESS)    Mar 30 14:55:43 dev-a01-si-solr.bip solr[6973]: *** [WARN] *** Your open file limit is currently 1024.  Mar 30 14:55:43 dev-a01-si-solr.bip solr[6973]:  It should be set to 65000 to avoid operational disruption.  Mar 30 14:55:43 dev-a01-si-solr.bip solr[6973]:  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh  Mar 30 14:55:43 dev-a01-si-solr.bip solr[6973]: *** [WARN] ***  Your Max Processes Limit is currently 6721.  Mar 30 14:55:43 dev-a01-si-solr.bip solr[6973]:  It should be set to 65000 to avoid operational disruption.  Mar 30 14:55:43 dev-a01-si-solr.bip solr[6973]:  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh  Mar 30 14:55:49 dev-a01-si-solr.bip solr[6973]: [194B blob data]  Mar 30 14:55:49 dev-a01-si-solr.bip solr[6973]: Started Solr server on port 8983 (pid=7045). Happy searching!  Mar 30 14:55:49 dev-a01-si-solr.bip solr[6973]: [14B blob data]  Mar 30 14:55:49 dev-a01-si-solr.bip systemd[1]: Started LSB: Controls Apache Solr as a Service.  

What am I doing wrong here?

update After amending /etc/systemd/system.conf to contain...

DefaultLimitNOFILE=65000  DefaultLimitNPROC=65000  

Solr is no longer complaining about the file limit but still complaining about the process limit. WTF Pottering?

  Drop-In: /etc/systemd/system/solr.service.d             └─override.conf     Active: active (exited) since Mon 2020-03-30 15:21:59 BST; 14s ago       Docs: man:systemd-sysv-generator(8)    Process: 1141 ExecStart=/etc/init.d/solr start (code=exited, status=0/SUCCESS)    Mar 30 15:21:51 dev-a01-si-solr.bip solr[1141]:  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh  Mar 30 15:21:51 dev-a01-si-solr.bip solr[1141]: *** [WARN] ***  Your Max Processes Limit is currently 6721.  Mar 30 15:21:51 dev-a01-si-solr.bip solr[1141]:  It should be set to 65000 to avoid operational disruption.  Mar 30 15:21:51 dev-a01-si-solr.bip solr[1141]:  If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh  Mar 30 15:21:51 dev-a01-si-solr.bip solr[1141]: Warning: Available entropy is low. As a result, use of the UUIDField, SSL, or any other features that require  Mar 30 15:21:51 dev-a01-si-solr.bip solr[1141]: RNG might not work properly. To check for the amount of available entropy, use 'cat /proc/sys/kernel/random/entropy_avail'.  Mar 30 15:21:59 dev-a01-si-solr.bip solr[1141]: [230B blob data]  Mar 30 15:21:59 dev-a01-si-solr.bip solr[1141]: Started Solr server on port 8983 (pid=1459). Happy searching!  Mar 30 15:21:59 dev-a01-si-solr.bip solr[1141]: [14B blob data]  Mar 30 15:21:59 dev-a01-si-solr.bip systemd[1]: Started LSB: Controls Apache Solr as a Service.  

Amending user.conf to match did not help.

Update 2 Well, this just keeps getting better and better. The disappearance of the nfile warning came after a reboot of the host. When I subsequently run systemctl restart solr I get this:

Mar 30 15:39:21 dev-a01-si-solr.bip solr[2503]: *** [WARN] *** Your open file limit is currently 1024.  Mar 30 15:39:21 dev-a01-si-solr.bip solr[2503]:  It should be set to 65000 to avoid operational disruption.  

FFS!

Now, where did I put that Centos 5 CD?

Update 3

It turns out that this was no longer the packaged solr. Unbenknownst to me, someone had problems getting the original build to work and found a tutorial on the internet on how to install from tarball. So I now have a system with half tarball/half repo solr which we can't patch / upgrade.

How to fix broken packages on Ubuntu

Posted: 10 Oct 2021 08:29 PM PDT

I have a Ubuntu 18.10 server, and recently tried to update git. I keep getting errors that a number of packages are not properly installed.

Errors were encountered while processing:   libpaper1:amd64   libpaper-utils   unattended-upgrades   libgs9:amd64   ghostscript  

Then I ran dpkg --configure -a and see the same errors. I want to be careful and not hose my system, but how can I fix these errors?

~ $ sudo apt list --upgradable  Listing... Done  ~ $ sudo apt-get check  Reading package lists... Done  Building dependency tree  Reading state information... Done  ~ $ sudo apt-get check  Reading package lists... Done  Building dependency tree  Reading state information... Done  ~ $ sudo apt-get upgrade  Reading package lists... Done  Building dependency tree  Reading state information... Done  Calculating upgrade... Done  0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.  5 not fully installed or removed.  After this operation, 0 B of additional disk space will be used.  Do you want to continue? [Y/n] y  Setting up libpaper1:amd64 (1.1.24+nmu5ubuntu1) ...  dpkg: error processing package libpaper1:amd64 (--configure):   installed libpaper1:amd64 package post-installation script subprocess returned error exit status 10  dpkg: dependency problems prevent configuration of libpaper-utils:   libpaper-utils depends on libpaper1; however:    Package libpaper1:amd64 is not configured yet.    dpkg: error processing package libpaper-utils (--configure):   dependency problems - leaving unconfigured  Setting up unattended-upgrades (1.5ubuntu3.18.10.4) ...  dpkg: error processing package unattended-upgrades (--configure):   installed unattended-upgrades package post-installation script subprocess returned error exit status 10  dpkg: dependency problems prevent configuration of libgs9:amd64:   libgs9:amd64 depends on libpaper1; however:    Package libpaper1:amd64 is not configured yet.    dpkg: error processing package libgs9:amd64 (--configure):   dependency problems - leaving unconfigured  dpkg: dependency problems prevent configuration of ghostscript:   ghostscript depends on libgs9 (= 9.26~dfsg+0-0ubuntu0.18.10.9); however:    Package libgs9:amd64 is not configured yet.    dpkg: error processing package ghostscript (--configure):   dependency problems - leaving unconfigured  Processing triggers for libc-bin (2.28-0ubuntu1) ...  Errors were encountered while processing:   libpaper1:amd64   libpaper-utils   unattended-upgrades   libgs9:amd64   ghostscript  E: Sub-process /usr/bin/dpkg returned an error code (1)  

EDIT

In response to @Stefan Skoglund's question:

~ $ sudo apt-cache policy libpaper1  libpaper1:    Installed: 1.1.24+nmu5ubuntu1    Candidate: 1.1.24+nmu5ubuntu1    Version table:   *** 1.1.24+nmu5ubuntu1 500          500 http://mirror.hetzner.de/ubuntu/packages cosmic/main amd64 Packages          500 http://de.archive.ubuntu.com/ubuntu cosmic/main amd64 Packages          100 /var/lib/dpkg/status    ~ $ sudo dpkg-reconfigure -plow libpaper1  /usr/sbin/dpkg-reconfigure: libpaper1 is broken or not fully installed  

EDIT 2

Throwing caution to the wind, I closed my eyes, crossed by fingers and tried this:

sudo apt-get --purge remove libpaper1:amd64 libpaper-utils unattended-upgrades libgs9:amd64 ghostscript  sudo apt-get clean  sudo apt-get update && sudo apt-get upgrade  sudo apt autoremove  

It magically worked.

Bounty is still available to someone who could explain what happened here and what the best practice / troubleshooting hints would be.

MS SQL Port Forwarding with IP Tables

Posted: 10 Oct 2021 08:01 PM PDT

I have 2 remote servers, one Linux box and one Windows Box. The Windows box is running MS SQL Server. It's behind a firewall, and I can only access it from my Linux box (I can netcat to the Windows box on port 1433 so that's working ok).

I'd like to use the linux box as a proxy so I can connect to the MS-SQL Server from my desktop.

I've tried setting up IP tables on the linux box with the following config, but my desktop still won't connect.

#!/bin/sh    echo 1 > /proc/sys/net/ipv4/ip_forward    iptables -F  iptables -t nat -F  iptables -X  iptables -t nat -A PREROUTING -p tcp --dport 1433 -j DNAT --to-destination xxx.xxx.xxx.xxx:1433  

Any help would be greatly appreciated!

NRPE: Unable to read output - Permissions issues

Posted: 10 Oct 2021 06:03 PM PDT

I am trying to be as clear as possible: my brain is going to explode, like those explodding kittens.

Both machines Centos 7:

[root@192.168.10.2]# cat /proc/version  Linux version 3.10.0-693.11.6.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jan 4 01:06:37 UTC 2018  

And latest NRPE from EPEL:

[root@192.168.10.1]# ./check_nrpe -H 192.168.10.2  NRPE v3.2.0  

I am trying to restart a service from nagios server, so I can set an event handler. All started with a lots of scripts, but now I shrink the problem to this:

[root@192.168.10.1]# ./check_nrpe -H 192.168.10.2 -c restart  NRPE: Unable to read output    [root@192.168.10.1]# ./check_nrpe -H 192.168.10.2 -c status  (... correct service status output ...)  Loaded: loaded (/usr/lib/systemd/system/cachefilesd.service  (... correct service status output ...)  

So, I can status services, but cannot start or restart.

[root@192.168.10.2]# cat /etc/nagios/nrpe.conf:    [...]  nrpe_user=nrpe  nrpe_group=nrpe  allowed_hosts=127.0.0.1,192.168.10.1  command[status]=/lib64/nagios/plugins/status.sh  command[restart]=/lib64/nagios/plugins/restart.sh  [...]      [root@192.168.10.2]# cat /lib64/nagios/plugins/status.sh    #!/bin/bash  sudo systemctl status cachefilesd  exit 0  

and

[root@192.168.10.2]# cat /lib64/nagios/plugins/restart.sh    #!/bin/bash  sudo systemctl restart cachefilesd  exit 0  

sudoers:

[root@192.168.10.2]# cat /etc/sudoers    # Defaults specification  Defaults: nrpe !requiretty  Defaults: nagios !requiretty    nagios ALL = NOPASSWD: /sbin/service,/usr/bin/systemctl,/usr/sbin/service  nrpe ALL = NOPASSWD: /sbin/service,/usr/bin/systemctl,/usr/sbin/service  

If I type:

[root@192.168.10.2]# sudo -u nrpe -H ./restart-cachefilesd.sh  

All is fine.

I enabled debug in NRPE, and I get:

nrpe[5431]: Host address is in allowed_hosts  nrpe[5431]: Host 192.168.10.1 is asking for command 'restart' to be run...  nrpe[5431]: Running command: /lib64/nagios/plugins/restart.sh  nrpe[5432]: WARNING: my_system() seteuid(0): Operation not permitted  nrpe[5431]: Command completed with return code 0 and output:  nrpe[5431]: Return Code: 3, Output: NRPE: Unable to read output  nrpe[5431]: Connection from 192.168.10.1 closed.  

I tried to strace the output, but is too much for me...

LAPS Find-AdmPwdExtendedRights : Object not found

Posted: 10 Oct 2021 09:02 PM PDT

I am setting up LAPS at the moment and want to use the standard "Computers" Organisational Unit.

I am working through the setup guide but I keep getting this error:

PS C:\Users\Administrator.DOMAIN> Find-AdmPwdExtendedRights -OrgUnit "Comp  uters" | Format-Table  Find-AdmPwdExtendedRights : Object not found  At line:1 char:1  + Find-AdmPwdExtendedRights -OrgUnit "Computers" | Format-Table  + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~      + CategoryInfo          : NotSpecified: (:) [Find-AdmPwdExtendedRights], N     otFoundException      + FullyQualifiedErrorId : AdmPwd.PSTypes.NotFoundException,AdmPwd.PS.FindE     xtendedRights  

I get similar error message about Object not found when I try using the Set-AdmPwdComputerSelfPermission commandlet etc.

Certutil -revoke

Posted: 10 Oct 2021 07:03 PM PDT

i am trying to use certutil to manage my CA. Is there a possible way to user Certutil -revoke "RequestID=?"

I only see it for the SerialNumber of the certificate wich is not really handsome.

Envy

Can OpenSSL be used to debug an SSL connection to a MySQL server?

Posted: 10 Oct 2021 09:25 PM PDT

I want my webserver to speak to the MySQL database server over an SSL connection. The Webserver runs CentOS5, the Database Server runs FreeBSD. The certificates are provided by a intermediate CA DigiCert.

MySQL should be using ssl, according to my.cnf:

# The MySQL server  [mysqld]  port            = 3306  socket          = /tmp/mysql.sock  ssl  ssl-capath = /opt/mysql/pki/CA  ssl-cert = /opt/mysql/pki/server-cert.pem  ssl-key = /opt/mysql/pki/server-key.pem  

When I start MySQL, the daemon starts without errors. This suggests that the certificate files are all readable.

But when I try to connect from the webserver to the database server, I get an error:

[root@webserver ~]# mysql -h mysql.example.org -u user -p  ERROR 2026 (HY000): SSL connection error  

And if I try to debug further with openssl:

[root@webserver ~]# openssl s_client -connect mysql.example.org:3306 0>/dev/null  CONNECTED(00000003)  15706:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:s23_clnt.c:588:  

Is this a valid way to test the SSL connection to a MySQL database server? The SSL23_GET_SERVER_HELLO:unknown protocol message is strange since this typically what you would see if you were speaking SSL on a port intended for non-SSL traffic.

This same openssl command seems to work fine with LDAP & HTTP servers:

$ openssl s_client -connect ldap.example.org:636  0>/dev/null  CONNECTED(00000003)  depth=2 /C=US/O=The Go Daddy Group, Inc./OU=Go Daddy Class 2 Certification Authority  ...  $ openssl s_client -connect www.example.org:443  0>/dev/null  CONNECTED(00000003)  depth=0 /DC=org/DC=example/OU=Services/CN=www.example.org  

How to install APC on a vagrant box running Ubuntu with PHP?

Posted: 10 Oct 2021 09:02 PM PDT

In my "Vagrant" file I have this line:

chef.add_recipe("php::module_apc")  

But it gives me this error:

[2013-01-11T22:14:53+00:00] INFO: Processing package[php-apc] action install (php::module_apc line 34)  ================================================================================  Error executing action `install` on resource 'package[php-apc]'  ================================================================================  Chef::Exceptions::Exec  ----------------------  apt-get -q -y install php-apc=3.1.7-1 returned 100, expected 0    Resource Declaration:  ---------------------  # In /tmp/vagrant-chef-1/chef-solo-1/cookbooks/php/recipes/module_apc.rb     33: when "debian"   34:   package "php-apc" do   35:     action :install   36:   end   37: end    Compiled Resource:  ------------------  # Declared in /tmp/vagrant-chef-1/chef-solo-1/cookbooks/php/recipes/module_apc.rb:34:in `from_file'    package("php-apc") do    retry_delay 2    retries 0    recipe_name "module_apc"    action [:install]    cookbook_name :php    package_name "php-apc"  end  [2013-01-11T22:14:53+00:00] ERROR: Running exception handlers  [2013-01-11T22:14:53+00:00] ERROR: Exception handlers complete  [2013-01-11T22:14:53+00:00] FATAL: Stacktrace dumped to /tmp/vagrant-chef-1/chef-stacktrace.out  [2013-01-11T22:14:53+00:00] FATAL: Chef::Exceptions::Exec: package[php-apc] (php::module_apc line 34) had an error: Chef::Exceptions::Exec: apt-get -q -y install php-apc=3.1.7-1 returned 100, expected 0  Chef never successfully completed! Any errors should be visible in the output above. Please fix your recipes so that they properly complete.  

I'm also running this before:

chef.add_recipe("apt")  

But it's no help either.

Any ideas how to fix this? Thanks a lot!

Btw, I'm using all cookbooks from OpsCode: https://github.com/opscode-cookbooks/

Network behavior of slow Windows shared storage

Posted: 10 Oct 2021 06:03 PM PDT

I have got on my hands 3 Windows XP file servers (their sole purpose is their SMB share) running on a office with about 50 users. The workload is only office usage: they use it to store and share among them Access databases and XLS files, and they use the files over the network share.

It is almost instantaneous to copy a 700 Kb XLS file from one of the servers to a workstation, but it takes over a minute to load it from a remote share with Excel. This same file is loaded in a few seconds if from a local disk.

I don't know what makes the access to the file so slow when using it via the network, I suspect it is some quirk of Windows remote file access (maybe authentication?), and I hope to be possible to change some simple flag on the servers to speed things up to a sane speed. I have taken screen shots of the network usage while loading the aforementioned XLS file, can you recognize this pattern and possibly give me some clue of what is the problem?

In the first image, there are two runs of Excel loading the remote file, both taking more than one minute to complete. The top and bottom graphs are from the same thing, but I only found later the task manager option to discriminate between upload (red) and download (yellow), so I took 2 different screenshots (concatenated bellow). Both runs took more than one minute, possibly more than 2 minutes.

runs_1_2

In the second image there are the 3rd and 4th runs. This time they ran considerably faster than the first one, but still too slow for bearable use. Both took more than 1 minute, but in the 4th run it occurred to me to measure the time properly, and I found it to take 1 minute and 42 seconds. That was the fastest of them. This time I only took one screenshot, of the discriminated version.

runs_3_4

What I noticed in all runs is the initial peak, about 8 seconds after I start the run, then the network usage drops to very low usage, then, a few seconds later, there is another peak, the biggest concentrated activity, then a long time of almost no activity, when finally Excel shows the file. There is still another peak that begins when the file is shown, and lasts for a few seconds. The offset between the start and end of the run, and the activity in the graph seems to be caused by a delay in the task manager to show the data. I don't know when the file is actually downloaded. I can't also explain why the green graph shows a small activity between peaks, and the red/yellow graph shows none. But the most intriguing of all is the minute long pause between the second and third peak, when I have no idea on what is happening, and certainly could be much faster.

Can someone experienced in Windows networks provide some expert guess on what is the problem with this setup (apart from decade old operating system)? Do you recognize these graph patterns? Can explain it? Have any hint on how to improve performance?

LVS / IPVS difference in ActiveConn since upgrading

Posted: 10 Oct 2021 10:08 PM PDT

I've recently migrated from an old version of LVS / ldirectord (Ultra Monkey) to a new Debian install with ldirectord.

Now the amount of Active Connections is usually higher than the amount of Inactive Connections, it used to be the other way around.

Basically on the old load balancer the connections looked something like:

  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn    -> 10.84.32.21:0               Masq    1      12        252    -> 10.84.32.22:0               Masq    1      18        368  

However since migrating it to the new load balancer it looks more like:

  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn    -> 10.84.32.21:0               Masq    1      313        141    -> 10.84.32.22:0               Masq    1      276        183  

Old load balancer:

  • Debian 3.1
  • ipvsadm 1.24
  • ldirectord 1.2.3

New load balancer:

  • Debian 6.0.5
  • ipvsadm 1.25
  • ldirectord 1.0.3 (I guess the versioning system changed)

Is it because the old load balancer was running a kernel from 2005, and ldirectord from 2004, and things have simply changed in the past 7 - 8 years?

Did I miss some sysctl settings that I should be enforcing for it to behave in the same way?

Everything appears to be working fine but can anyone see an issue with this behaviour?

Thanks in advance!

Additional info: I'm using LVS in masquerading mode, the real servers have the load balancer as their gateway. The real servers are running Apache, which hasn't changed during the upgrade. The boxes themselves show roughly the same amount of Inactive Connections shown in ipvsadm.

Replace a Cisco VPN IPSec concentrator with an Ubuntu-box

Posted: 10 Oct 2021 08:01 PM PDT

Is it possible to replace a Cisco VPN IPSec concentrator with Ubuntu and for instance Strongswan?

1) Do Strongswan implement the same protocolls that Cisco uses?

2) Can we retrieve keys from the Cisco concentrator and import them to the Ubuntu-box, if not can we generate new keys that suits equipment at sites?

3) Are there any performance concerns to think of?

No comments:

Post a Comment